Deep dive into OpenAI's approach to reinforcement fine-tuning for code models.

https://x.com/willhang_
https://x.com/cathyzhou


AIE is coming to London and SF! see dates and sign up to be notified of sponsorships, CFPs, and ticketsa: https://ai.engineer

AI Engineer

Agent Reinforcement Fine-Tuning (Agent RFT) is OpenAI's method for improving AI agents that interact with external tools by changing model weights based on custom reward signals. During training, agents explore different tool-calling strategies while the model learns from rollouts that interact with real endpoints. The technique is sample-efficient (working with as few as 10 examples), reduces latency by optimizing tool call budgets, and helps models adapt to domain-specific environments. Case studies from Cognition, Codto, Cosine, and Macco demonstrate 5-10% accuracy improvements and significant latency reductions. Success requires well-defined tasks, production-matching training data, explorable solution spaces, and robust reward functions that resist gaming.

Agent Reinforcement Fine Tuning – Will Hang & Cathy Zhou, OpenAI