๐ŸŽจ
Mac Playbook
โฑ 15 min

FLUX / Stable Diffusion with MLX

Generate images with FLUX and SD models natively on Mac

Replaces DGX Spark: FLUX.1 Dreambooth LoRA
image generationmlx

Basic idea

FLUX.1 is a state-of-the-art image generation model developed by Black Forest Labs. Unlike Stable Diffusion (which uses a U-Net denoiser), FLUX.1 is a diffusion transformer (DiT) trained with a flow-matching objective โ€” a newer technique that produces straighter paths through the noise manifold, yielding better image quality and prompt adherence. mflux is a pure MLX implementation that runs FLUX.1 natively on Apple Silicon, bypassing PyTorch entirely. It uses Metal (Apple's GPU compute API) through MLX's unified memory model, achieving 30โ€“60 seconds per 1024ร—1024 image on M2 Max at 20 steps, or 8โ€“15 seconds with the schnell (4-step) variant.

What you'll accomplish

A working local image generation setup that produces 1024ร—1024 images from text prompts using both FLUX.1 schnell (fast, 4 steps, no HuggingFace token needed) and FLUX.1 dev (higher quality, 20 steps). Images are saved as PNG files locally. You will understand what each command-line flag controls and be able to tune generation for your hardware.

What to know before starting

Diffusion models: โ€” Generation works by starting from Gaussian noise (a random tensor), then taking N denoising steps. Each step runs the model to predict and subtract a bit of noise. After N steps you have a coherent image. More steps = higher quality but slower.
Flow matching: โ€” FLUX's training objective. Instead of DDPM's curved noise paths, flow matching learns straighter "flows" from noise to image, meaning fewer steps are needed for good quality. This is why schnell works in 4 steps.
schnell vs dev: โ€” `schnell` (German: fast) is distilled via consistency distillation to 4 steps. It's the quick, open-weight model. `dev` is the full-quality model (20โ€“50 steps), gated on HuggingFace โ€” you must accept its license before downloading.
Quantization: โ€” FLUX.1 in full float16 requires ~34 GB of memory. 8-bit quantization reduces this to ~17 GB; 4-bit to ~9 GB. Quality degrades slightly with 4-bit but remains excellent for most prompts.
CFG (classifier-free guidance): โ€” Controls how strongly the model follows your prompt vs generating freely. schnell does not use CFG (it's baked into distillation). dev uses a `guidance_scale` parameter; 3.5โ€“7.0 is the useful range.

Prerequisites

โ€ข macOS 14.0+ (Sonoma) โ€” required for latest MLX Metal kernels
โ€ข Apple Silicon Mac (M1, M2, M3, or M4 series)
โ€ข Python 3.10+
โ€ข 16 GB+ unified memory (minimum for schnell 4-bit); 32 GB+ recommended for dev 4-bit
โ€ข HuggingFace account (only required for FLUX.1 dev)

Time & risk

Duration:: 15 minutes setup; first run downloads ~9 GB (schnell 4-bit) or ~17 GB (dev 8-bit)
Risk level:: Low โ€” no system changes, only pip package and model download
Rollback:: `pip uninstall mflux`; delete `~/.cache/huggingface/` to reclaim disk space