A practical guide to LLM fine-tuning for product developers, grounded in a real code review tool use case. Covers when to choose fine-tuning over RAG or prompt engineering (behavior vs. knowledge), the differences between full fine-tuning, LoRA, and QLoRA, base model selection (Llama 3.3 70B, Qwen 2.5 Coder, Mistral Small 3, Phi-4, Gemma 3), data preparation best practices, training infrastructure options (Modal, RunPod, Axolotl, HuggingFace trl), evaluation methodology, cost math showing how eliminating large system prompts can save thousands per month, and deployment options including vLLM and Ollama. Emphasizes building an evaluation set before training and warns against fine-tuning as a cure-all.
Table of contents
The Real Decision: Fine-Tuning vs RAG vs Prompt EngineeringWhen Fine-Tuning Actually Makes SenseFull Fine-Tuning vs LoRA vs QLoRAPicking a Base ModelData Preparation: The Part That Determines EverythingTraining: Where and HowEvaluating the Fine-Tuned ModelCost MathDeployment OptionsWhat Fine-Tuning Cannot FixStart With the EvaluationSort: