•From x.com
Robert Youssef @rryssf_
Tencent researchers found a way to get reinforcement learning performance without updating a single parameter it costs $18. the RL methods it outperforms cost $10,000+ the method is called Training-Free GRPO, and the core idea is more interesting than the cost savings https://t.co/krYfosYQLj

Sort: