A deep dive into the Hamilton-Jacobi-Bellman (HJB) equation, tracing its roots from Bellman's 1952 dynamic programming work through continuous-time reinforcement learning to modern diffusion models. Covers the derivation of HJB for both deterministic and stochastic controlled diffusions, introduces the continuous-time Q-function, and presents a neural policy iteration algorithm using MLPs for value and policy networks. Benchmarks include the linear-quadratic regulator and Merton's portfolio problem. The post also shows how diffusion model training can be recast as a finite-horizon stochastic optimal control problem, with the score function emerging as the optimal control law.

9m read timeFrom dani2442.github.io
Post cover image
Table of contents
1. Introduction #2. Continuous-time Reinforcement Learning #3. Diffusion Models #References #Appendix A: LQR Derivation #Appendix B: Merton Derivation #Appendix C: Non-autonomous and Finite-Horizon Cases #

Sort: