Daily Dose of Data Science | Avi Chawla | Substack

How to Fine-Tune LLMs in 2026

Modern LLM fine-tuning in 2026 no longer requires manually curated datasets or hand-crafted reward functions. The post explains Reinforcement Fine-Tuning (RFT) using GRPO (Group Relative Policy Optimization), the algorithm behind DeepSeek-R1, which trains models by generating multiple completions and reinforcing above-average behaviors. ART (Agent Reinforcement Trainer) is introduced as an open-source framework that applies GRPO to real-world multi-step agents with tool calls, supporting LangGraph, CrewAI, and ADK integrations. RULER (Relative Universal LLM-Elicited Rewards) eliminates the need for manual reward functions by using an LLM-as-judge to rank agent trajectories comparatively. A practical notebook example trains a 3B model on MCP server tasks using automatic RULER evaluation.

#python

#ai-agents

#reinforcement-learning

Apr 20•7m read time•From blog.dailydoseofds.com

Table of contents

How to fine-tune LLMs in 2026 12 must-use features in Claude Code P.S. For those wanting to develop “Industry ML” expertise:

Comment

Bookmark

Copy

Sort: