Most teams should not fine-tune LLMs — they should fix prompts and build proper RAG pipelines first. Fine-tuning is for shaping behavior, style, and structured output (form), not injecting knowledge (facts). The recommended 2026 sequence is Prompt → RAG → Fine-tune → Distill. LoRA and QLoRA are the only practical approaches for most teams, with full fine-tuning rarely justified. For preference alignment, DPO, ORPO, and KTO replace expensive RLHF depending on available data. The winning production pattern combines fine-tuning and RAG: tune the interface (query rewriting, citation format, refusal behavior) while retrieving the content. A five-point checklist is provided to decide whether fine-tuning is warranted, with the real cost being data curation, evaluation, and 12-month lifecycle ownership rather than training compute.
Table of contents
The 2026 State of "Do I Even Need to Fine-Tune?"Fine-Tuning Is for Form, Not Facts: A Decision FrameworkPEFT: The Only Fine-Tuning Most Teams Should DoData Preparation and the Operational Tax Nobody Talks AboutThe Pattern That Actually Wins: Fine-Tune the Interface, Retrieve the ContentShould You Fine-Tune? A ChecklistSort: