From x.com
rryssf_'s profile

Robert Youssef @rryssf_

Tencent researchers found a way to get reinforcement learning performance without updating a single parameter it costs $18. the RL methods it outperforms cost $10,000+ the method is called Training-Free GRPO, and the core idea is more interesting than the cost savings https://t.co/krYfosYQLj

Post cover image

Sort: