QLoRA enables fine-tuning of FLUX.1-dev diffusion models on consumer hardware with under 10GB VRAM by combining 4-bit quantization with Low-Rank Adaptation. The approach uses bitsandbytes for quantization, 8-bit AdamW optimizer, gradient checkpointing, and cached latents to dramatically reduce memory usage from ~120GB to ~9GB.
Table of contents
Table of ContentsDatasetFLUX ArchitectureQLoRA Fine-tuning FLUX.1-dev with diffusersFP8 Fine-tuning with torchaoInference with Trained LoRA AdaptersRunning on Google ColabConclusionSort: