Native Ollama API integration offers advantages in terms of setup simplicity and model management compared to using the OpenAI API alone. For example, Open WebUI automatically detects the server on port 11434 and allows downloading, ejecting, and checking model status directly from the web interface.

Lemonade Server and Ollama API

Lemonade Server has added support for the Ollama API, connecting functions to the /api endpoints. This allows starting Lemonade on the same port as Ollama (e.g., 11434) and using custom llamacpp binaries, specifying the path via environment variables such as LEMONADE_LLAMACPP_VULKAN_BIN or LEMONADE_LLAMACPP_ROCM_BIN. It is also possible to use GGUF models from llamacpp -hf or LM Studio, indicating the directory via the --extra-models-dir option.

Integration with Open WebUI

After configuring Lemonade Server, Open WebUI should automatically detect Lemonade, populate the model list with available GGUF and/or NPU models, and provide access to features otherwise exclusive to Ollama. This approach offers greater flexibility in choosing and using models, allowing you to leverage Ollama API functionalities without directly depending on it.

For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.