Gradient-based Planning for World Models at Longer Horizons

GRASP is a new gradient-based planner for learned world models that makes long-horizon planning practical. It addresses three core problems: (1) it lifts the trajectory into virtual states enabling parallel optimization across time, (2) injects Gaussian noise into state iterates for exploration, and (3) stops brittle state-input gradients through the world model while preserving clean action gradients. The key insight is that state Jacobians in deep learning-based world models suffer from adversarial robustness issues (similar to adversarial examples in vision), making direct state optimization unreliable, while action Jacobians remain well-behaved. GRASP combines a stop-gradient collocation objective with dense goal shaping and periodic synchronization with true rollout gradients. Benchmarks on Push-T tasks show GRASP outperforms CEM, vanilla gradient descent, and LatCo at horizons H=50–80, achieving higher success rates and faster planning times.

#reinforcement-learning

Apr 20•14m read time•From bair.berkeley.edu

Table of contents

What is a world model?Planning: choosing actions by optimizing through the model Why long-horizon planning is hard (even when everything is differentiable)A long-horizon fix: lifting the dynamics constraint An issue for deep learning-based world models: sensitivity of state-input gradients GRASP: Gradient RelAxed S tochastic P lanner Ingredient 1: Exploration by noising the state iterates Ingredient 2: Reshape gradients: stop brittle state-input gradients, keep action gradients Periodic “sync”: briefly return to true rollout gradients How GRASP addresses long-range planning What’s next?Citation

Comment

Bookmark

Copy

Sort: