Understanding LoRA, QLoRA, RLHF, DPO, GRPO, etc.

Daily Dose of DS offers a daily dose of inspiration, education, and motivation for data scientists and aspiring data professionals. Through bite-sized articles, tutorials, and curated resources, readers embark on a journey to master the art and science of data analysis, machine learning, and artificial intelligence. By staying updated with the latest trends, techniques, and tools in data science, readers can hone their skills and stay ahead in this rapidly evolving field.

Daily Dose of Data Science | Avi Chawla | Substack

Part 13 of a full LLMOps crash course covering LLM fine-tuning techniques. Topics include parameter-efficient training methods like LoRA and QLoRA, and alignment techniques such as RLHF, DPO, and GRPO, with hands-on code examples. The broader course context explains why LLMOps differs from traditional MLOps, covering cost structures, reliability, monitoring for hallucinations, and prompt brittleness in production LLM systems.

LLM Fine-tuning: Techniques for Adapting Language Models