Best of Reinforcement LearningJanuary 2025

  1. 1
    Article
    Avatar of medium_jsMedium·1y

    Mathematical Foundation Underpinning Reinforcement Learning

    Reinforcement learning (RL) is inspired by the process of learning from experience, with the Soft Actor-Critic (SAC) algorithm being a popular framework. This post discusses the mathematical foundation of SAC agents, detailing the actor (policy) and critic networks. The actor network uses a neural network to estimate actions and their probabilities while the critic network estimates the expected return of action-state pairs. Python code snippets in PyTorch demonstrate the implementation of these networks and their integration into a RL model.

  2. 2
    Video
    Avatar of computerphileComputerphile·1y

    Solve Markov Decision Processes with the Value Iteration Algorithm - Computerphile

    The value iteration algorithm is a method for solving Markov decision processes (MDPs) to produce optimal action decisions. MDPs model decision-making problems, particularly those under uncertainty. The algorithm iteratively computes the values of states to find the policy that minimizes cost or maximizes reward. It is essential for decision-making models where dynamic programming techniques are applied to achieve the best outcome.

  3. 3
    Article
    Avatar of hnHacker News·1y

    Jiayi-Pan/TinyZero

    TinyZero is based on DeepSeek R1 Zero, enhanced with veRL. Using reinforcement learning, it demonstrates the development of self-verification and search abilities in a 3B base LM. The project can be experimented with for less than $30.