Gradient-based planning for world models at longer horizons

GRASP is a new gradient-based planner designed to make long-horizon planning with learned world models more practical and robust. The core problems addressed are: exploding/vanishing gradients from backpropagation through time, non-greedy local minima at longer horizons, and adversarial robustness issues when optimizing through high-dimensional deep learning models. GRASP uses a collocation (lifted-state) formulation that parallelizes optimization across time, injects Gaussian noise into virtual state updates for exploration, and avoids brittle state-input gradients by using only action Jacobians. A periodic refinement phase keeps trajectories grounded. Benchmarks on Push-T tasks show GRASP achieves higher success rates and faster planning times than CEM, standard gradient descent, and LatCo across horizons from H=40 to H=80.

#robotics

#reinforcement-learning

Apr 28•12m read time•From robohub.org

Table of contents

What is a world model?Planning: choosing actions by optimizing through the model Why long-horizon planning is hard (even when everything is differentiable)A long-horizon fix: lifting the dynamics constraint An issue for deep learning-based world models: sensitivity of state-input gradients GRASP: Gradient RelAxed S tochastic P lanner Ingredient 1: Exploration by noising the state iterates Ingredient 2: Reshape gradients: stop brittle state-input gradients, keep action gradients Periodic “sync”: briefly return to true rollout gradients How GRASP addresses long-range planning What’s next?Citation

Comment

Bookmark

Copy

Sort: