Alessandro Cappelli, co-founder of Adaptive ML, argues that reinforcement learning (RL) is the critical missing ingredient for bringing LLM-based products from MVP to production at enterprise scale. He explains that 95% of GenAI pilots fail to reach production because prompt engineering and instruction fine-tuning lack systematic feedback integration. RL solves this by enabling continuous model improvement from business metrics, KPIs, and LLM-as-judge signals. Key benefits include smaller, cheaper, faster models with better tokenomics — illustrated by AT&T spending millions just on call summarization. For agentic use cases, RL is even more advantageous since it was originally designed for agent training in environments. Adaptive ML's platform (Adaptive Engine) abstracts RL complexity (e.g., PPO requiring four simultaneous LLMs) through pre-built recipes, letting enterprises define reward rubrics without implementing algorithms themselves. Human-in-the-loop is preserved through reward model design rather than expensive annotation campaigns.

18m watch time

Sort: