FLUX LoRA Fine-tuning

Fine-tune FLUX.1 image models with LoRA on Mac

Replaces DGX Spark: FLUX.1 Dreambooth LoRA

image generationfine-tuning

Basic idea

FLUX.1 is a state-of-the-art diffusion transformer model for text-to-image generation, developed by Black Forest Labs. LoRA fine-tuning teaches it a new subject — your dog, a specific illustration style, a product — by showing it 10-30 example images paired with text descriptions. The model adjusts its internal representations to associate your specific subject with a trigger word you choose.

The result is a small adapter file (~50-200 MB) that you apply on top of the frozen FLUX.1 base model at generation time. mflux implements FLUX.1 LoRA training natively on Apple Silicon using Metal acceleration, with no CUDA required.

What you'll accomplish

A trained LoRA adapter (~100 MB .safetensors file) that generates images of your specific subject when prompted with your trigger word. For example: "a photo of sks_dog playing in the snow" produces images of your dog rather than a generic dog. The adapter is usable with mflux-generate and compatible with other FLUX.1 tools.

What to know before starting

Diffusion models: FLUX.1 generates images by starting from random noise and iteratively denoising it toward an image described by the text prompt. The "model" is a learned function that predicts how to remove noise at each step.

What LoRA adds to image models: LoRA injects a style/subject bias into the attention layers that are active during the denoising process. It biases the denoising trajectory toward your subject's appearance without retraining the entire 12 billion parameter model.

Trigger words: The trigger word (`sks_dog`, `ohwx_cat`, etc.) disambiguates your subject from the general concept. Without it, the model has no way to distinguish "your dog" from "any dog." The word should be unusual — common words like "dog" confuse the model.

FLUX.1 dev vs schnell: `dev` was trained for quality and requires 20-50 steps per image. `schnell` was distilled to 4 steps — faster, but lower quality and worse at learning new subjects. Use `dev` for LoRA fine-tuning.

Training data quality is the bottleneck: Unlike LLM fine-tuning where more data is almost always better, image LoRA quality is limited by the consistency and quality of your input images. 15 excellent images outperform 50 mediocre ones.

Prerequisites

• macOS 14.0 or later

• Apple Silicon Mac (M1, M2, or M3 family)

• Python 3.10 or later

• 32 GB+ unified memory (FLUX.1 dev weights are ~34 GB)

• 20-50 GB free disk space (model weights + checkpoints)

• 10-30 training images of your subject

Time & risk

Duration: ~1 hour setup and data prep, several hours of training

Risk level: Medium — the initial FLUX.1 dev model download is ~34 GB and requires a HuggingFace account. Training is memory-intensive.

Rollback: Delete the mflux pip install and the HuggingFace cache at `~/.cache/huggingface/hub`.