Best of Reinforcement Learning — January 2025

1
Article
Medium·1y
Mathematical Foundation Underpinning Reinforcement Learning
Reinforcement learning (RL) is inspired by the process of learning from experience, with the Soft Actor-Critic (SAC) algorithm being a popular framework. This post discusses the mathematical foundation of SAC agents, detailing the actor (policy) and critic networks. The actor network uses a neural network to estimate actions and their probabilities while the critic network estimates the expected return of action-state pairs. Python code snippets in PyTorch demonstrate the implementation of these networks and their integration into a RL model.
33
2
Video
Computerphile·1y
Solve Markov Decision Processes with the Value Iteration Algorithm - Computerphile
The value iteration algorithm is a method for solving Markov decision processes (MDPs) to produce optimal action decisions. MDPs model decision-making problems, particularly those under uncertainty. The algorithm iteratively computes the values of states to find the policy that minimizes cost or maximizes reward. It is essential for decision-making models where dynamic programming techniques are applied to achieve the best outcome.
23
3
Article
Hacker News·1y
Jiayi-Pan/TinyZero
TinyZero is based on DeepSeek R1 Zero, enhanced with veRL. Using reinforcement learning, it demonstrates the development of self-verification and search abilities in a 3B base LM. The project can be experimented with for less than $30.
15

See all Reinforcement Learning archives