Reinforcement learning (RL) has become a viable approach for training agentic AI systems, with companies like DeepSeek demonstrating its effectiveness at scale. The key insight is that building agents and training reasoning models are fundamentally the same problem - both involve iterative interaction loops with environments

19m watch time

Sort: