TinyZero is based on DeepSeek R1 Zero, enhanced with veRL. Using reinforcement learning, it demonstrates the development of self-verification and search abilities in a 3B base LM. The project can be experimented with for less than $30.

2m read timeFrom github.com
Post cover image
Table of contents
InstalationCountdown taskAcknowledgeCitation

Sort: