Open WebUI with Ollama

Deploy a full ChatGPT-like interface locally

Replaces DGX Spark: Open WebUI with Ollama

inferenceui

Basic idea

Open WebUI is a self-hosted web application that gives you a polished ChatGPT-style interface on top of Ollama (or any OpenAI-compatible API). Bare Ollama gives you a CLI and a raw HTTP API — useful for developers but not for everyday use. Open WebUI adds:

• Persistent conversation history stored in a local SQLite database

• Model switching from a dropdown without restarting anything

• Document upload and RAG (Retrieval-Augmented Generation) for chatting with your files

• System prompt management and model-specific presets

• Multi-user accounts if you want to share a local server with teammates

On NVIDIA/cloud setups you might use hosted frontends or managed services. On a local Mac setup, Open WebUI running against local Ollama means your conversations, documents, and model weights never leave your machine.

What you'll accomplish

After following this playbook you will have:

• Open WebUI running at `http://localhost:3000`

• Persistent conversation history that survives restarts

• The ability to chat with any model you have pulled in Ollama via a polished web UI

• An admin account with all data stored locally in a Docker volume

What to know before starting

How Docker containers work:: Docker runs applications in isolated environments called containers. Each container has its own filesystem, but can mount "volumes" — directories on your Mac that persist after the container stops. Open WebUI runs in a container so its Python dependencies don't interfere with your system Python.

What host-gateway means:: By default, Docker containers on Mac cannot reach `localhost` of the Mac host — `localhost` inside a container refers to the container itself, not your Mac. The `--add-host=host.docker.internal:host-gateway` flag creates a DNS alias that lets the container reach your Mac's localhost, which is where Ollama is listening.

What Docker volumes are:: When you pass `-v open-webui:/app/backend/data`, Docker creates a persistent storage volume named `open-webui`. All of Open WebUI's SQLite database, uploaded documents, and user data live here. The volume persists when you stop or remove the container — you don't lose your conversations.

What RAG is:: Retrieval-Augmented Generation means the app searches through your uploaded documents, pulls the relevant passages, and includes them in the prompt context before asking the model to respond. The model doesn't "know" your documents — the relevant text is pasted into the prompt at query time.

Where data lives:: Everything is local. Conversations are in a SQLite database inside the Docker volume. Models are in `~/.ollama/models/`. Nothing is uploaded to any external service.

Prerequisites

• Ollama installed and running (`ollama serve` must be active — test with `curl http://localhost:11434/api/tags`)

• Docker Desktop for Mac installed and running (the whale icon in your menu bar), OR Python 3.11+ for the pip install path

• 8 GB+ unified memory

• At least one model pulled in Ollama (e.g., `ollama pull qwen2.5:7b`)

Time & risk

Duration:: 15 minutes (mostly waiting for Docker image download)

Risk level:: Low — entirely containerized; one command removes everything

Rollback:: `docker rm -f open-webui && docker volume rm open-webui`