Why the HJB is Bellman's equation in continuous time, why continuous time matters, and how to solve the resulting control problem with neural policy iteration.

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

A deep dive into the Hamilton-Jacobi-Bellman (HJB) equation, tracing its roots from Bellman's 1952 dynamic programming work through continuous-time reinforcement learning to modern diffusion models. Covers the derivation of HJB for both deterministic and stochastic controlled diffusions, introduces the continuous-time Q-function, and presents a neural policy iteration algorithm using MLPs for value and policy networks. Benchmarks include the linear-quadratic regulator and Merton's portfolio problem. The post also shows how diffusion model training can be recast as a finite-horizon stochastic optimal control problem, with the score function emerging as the optimal control law.

Hamilton-Jacobi-Bellman Equation: Reinforcement Learning and Diffusion Models

2. Continuous-time Reinforcement Learning #

Appendix C: Non-autonomous and Finite-Horizon Cases #