LM Studio on macOS

Polished GUI for local LLM inference using MLX backend

Replaces DGX Spark: LM Studio / SGLang

inferenceui

Basic idea

LM Studio is a native macOS application that wraps MLX (and llama.cpp for non-MLX models) with a polished interface for browsing, downloading, and running models. It abstracts away all the command-line complexity of MLX LM while still using it as the inference backend — meaning you get the same performance as running mlx_lm.generate directly, but through a GUI.

Think of it this way: MLX LM gives you maximum control and scriptability; LM Studio gives you the same inference speed with a point-and-click interface and no terminal required. For daily use, model exploration, or sharing local AI access with less technical teammates, LM Studio is often the better choice over the raw CLIs.

The application has two main components:

1. A model library browser and downloader connected to Hugging Face

2. A local server that exposes an OpenAI-compatible API at localhost:1234 — the same interface used by tools like Cursor, Continue.dev, and Obsidian Copilot

On Apple Silicon, LM Studio prioritizes the MLX backend for any model that has an MLX variant. For models that only exist as GGUF files, it falls back to its bundled llama.cpp engine with Metal acceleration.

What you'll accomplish

After following this playbook you will have:

• LM Studio installed as a native Mac app

• At least one model downloaded (MLX variant for best performance)

• A working chat interface with persistent conversation history

• An OpenAI-compatible local server running at `localhost:1234` that external tools can connect to

What to know before starting

MLX vs GGUF in LM Studio's model search:: When you search for a model in LM Studio, results show both MLX variants and GGUF variants. MLX variants run through Apple's MLX framework and are significantly faster on Apple Silicon. GGUF variants run through llama.cpp's Metal backend — still GPU-accelerated but typically 10–30% slower than MLX. Always choose MLX when available.

What the Q suffix means for GGUF models:: In LM Studio's model browser, GGUF models show suffixes like Q4_K_M, Q5_K_M, Q8_0. These indicate quantization level. Q4_K_M is the standard recommendation — good quality, fits a 7B model in ~6 GB RAM. Q8_0 is near-lossless but uses ~2x the RAM. If you see an MLX and a GGUF variant, pick MLX.

How LM Studio stores models:: Models are downloaded to `~/Documents/LM Studio/Models/` by default. Each MLX model is typically 3–20 GB depending on parameter count and quantization. You can change the storage location in LM Studio's settings.

The local server vs chat interface:: LM Studio has two separate concerns: the chat UI (for you to talk to models directly in the app) and the developer server (an HTTP API for other apps to use). You load a model for chat separately from loading a model for the server. Both can run the same model simultaneously.

Prerequisites

• macOS 13.1+ (macOS 14 Sonoma or later recommended for MLX performance)

• Apple Silicon Mac (M1 or later) — Intel Macs are supported but will not use the MLX backend

• 8 GB+ unified memory (16 GB recommended for 7B models)

• ~2 GB free disk space per model (varies: 7B 4-bit MLX is ~4.3 GB, 7B GGUF Q4_K_M is ~4.7 GB)

Time & risk

Duration:: 5 minutes to install and first chat (not counting model download time)

Risk level:: None — standard macOS app, no system modifications

Rollback:: Drag LM Studio to Trash; delete `~/Documents/LM Studio/` to remove all models and data