Start learning cyber security with TryHackMe: https://tryhackme.com/bycloud Use my code "BYCLOUD25" to get 25% off on annual subscription!

This video breaks down what's wrong with scaling RL for LLMs, especially in the direction of reaching AGI, but why RL still matters. As RL is noisy and can hurt generalization, yet it enables exploration and self-correction that pretraining can’t, we are stuck between a rock and a hard place with this direction. We’ll also look at why LoRA is becoming the practical way to do RL cheaply, swappable adapters that can match full fine-tuning on reasoning and make personalized agents easier to deploy, which might look like a promising future direction to apply RL on a massive scale.


my latest project: Intuitive AI Academy
http://intuitiveai.academy/
code "NYNM" for 50% off forever (limited to 50)


Dwarkesh Podcast w/ AK
[YouTube] https://youtu.be/lXUZvyajciY

Dwarkesh Podcast w/ Ilya 
[YouTube] https://youtu.be/aR20FWCCjAs

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
[Paper] https://arxiv.org/abs/2506.01939

The Path Not Taken: RLVR Provably Learns Off the Principals
[Paper] https://arxiv.org/abs/2511.08567

LoRA Without Regret
[Blog] https://thinkingmachines.ai/blog/lora/

Tina: Tiny Reasoning Models via LoRA
[Paper] https://arxiv.org/abs/2504.15777

Tinker
[Website] https://thinkingmachines.ai/tinker/


My Newsletter
https://mail.bycloud.ai/

My Patreon
https://www.patreon.com/c/bycloud


Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI

This video is supported by the kind Patrons & YouTube Members: 
🙏Spam Maj, Alex, Chris LeDoux, DX Research Group, Poof N' Inu, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon, Lame Plane, Matej Macak


[Discord] https://discord.gg/NhJZGtH
[Twitter] https://twitter.com/bycloudai
[Patreon] https://www.patreon.com/bycloud
[Business Inquiries] bycloud@smoothmedia.co
[Profile & Banner Art] https://twitter.com/pygm7
[Video Editor] Abhay and @Booga04 
[Ko-fi] https://ko-fi.com/bycloudai

ByCloud's resource offers insights, tutorials, and resources for cloud computing enthusiasts, developers, and IT professionals. Readers can learn about cloud architecture, DevOps practices, and cloud-native technologies. With articles, tutorials, and case studies, ByCloud provides  guidance and expertise for leveraging cloud computing to build scalable and resilient applications.

bycloud

Reinforcement learning (RL) has become crucial for improving LLM capabilities like coding and reasoning, despite experts claiming it won't lead to AGI. RL provides sparse signals (one bit per episode) compared to dense next-token prediction, making it computationally efficient but less generalizable. Recent research shows RL updates only 5% of model weights, making it compatible with LoRA (Low-Rank Adaptation). When properly configured with LoRA on all layers, 10x higher learning rates, and moderate batch sizes, RL training matches full fine-tuning performance while using only 2/3 of the compute. This combination enables efficient experimentation and personalized AI agents at scale, potentially making specialized capabilities widely accessible without achieving true AGI.

The RL Irony in LLMs (And its insane new Meta)