๐
Mac Playbook
โฑ 15 minOpen WebUI with Ollama
Deploy a full ChatGPT-like interface locally
Replaces DGX Spark: Open WebUI with Ollama
inferenceui
Basic idea
Open WebUI is a self-hosted web application that gives you a polished ChatGPT-style interface on top of Ollama (or any OpenAI-compatible API). Bare Ollama gives you a CLI and a raw HTTP API โ useful for developers but not for everyday use. Open WebUI adds:
โข Persistent conversation history stored in a local SQLite database
โข Model switching from a dropdown without restarting anything
โข Document upload and RAG (Retrieval-Augmented Generation) for chatting with your files
โข System prompt management and model-specific presets
โข Multi-user accounts if you want to share a local server with teammates
On NVIDIA/cloud setups you might use hosted frontends or managed services. On a local Mac setup, Open WebUI running against local Ollama means your conversations, documents, and model weights never leave your machine.
What you'll accomplish
After following this playbook you will have:
โข Open WebUI running at `http://localhost:3000`
โข Persistent conversation history that survives restarts
โข The ability to chat with any model you have pulled in Ollama via a polished web UI
โข An admin account with all data stored locally in a Docker volume
What to know before starting
How Docker containers work:: Docker runs applications in isolated environments called containers. Each container has its own filesystem, but can mount "volumes" โ directories on your Mac that persist after the container stops. Open WebUI runs in a container so its Python dependencies don't interfere with your system Python.
What host-gateway means:: By default, Docker containers on Mac cannot reach `localhost` of the Mac host โ `localhost` inside a container refers to the container itself, not your Mac. The `--add-host=host.docker.internal:host-gateway` flag creates a DNS alias that lets the container reach your Mac's localhost, which is where Ollama is listening.
What Docker volumes are:: When you pass `-v open-webui:/app/backend/data`, Docker creates a persistent storage volume named `open-webui`. All of Open WebUI's SQLite database, uploaded documents, and user data live here. The volume persists when you stop or remove the container โ you don't lose your conversations.
What RAG is:: Retrieval-Augmented Generation means the app searches through your uploaded documents, pulls the relevant passages, and includes them in the prompt context before asking the model to respond. The model doesn't "know" your documents โ the relevant text is pasted into the prompt at query time.
Where data lives:: Everything is local. Conversations are in a SQLite database inside the Docker volume. Models are in `~/.ollama/models/`. Nothing is uploaded to any external service.
Prerequisites
โข Ollama installed and running (`ollama serve` must be active โ test with `curl http://localhost:11434/api/tags`)
โข Docker Desktop for Mac installed and running (the whale icon in your menu bar), OR Python 3.11+ for the pip install path
โข 8 GB+ unified memory
โข At least one model pulled in Ollama (e.g., `ollama pull qwen2.5:7b`)
Time & risk
Duration:: 15 minutes (mostly waiting for Docker image download)
Risk level:: Low โ entirely containerized; one command removes everything
Rollback:: `docker rm -f open-webui && docker volume rm open-webui`