GRASP is a new gradient-based planner for learned world models that makes long-horizon planning practical. It addresses three core problems: (1) it lifts the trajectory into virtual states enabling parallel optimization across time, (2) injects Gaussian noise into state iterates for exploration, and (3) stops brittle state-input gradients through the world model while preserving clean action gradients. The key insight is that state Jacobians in deep learning-based world models suffer from adversarial robustness issues (similar to adversarial examples in vision), making direct state optimization unreliable, while action Jacobians remain well-behaved. GRASP combines a stop-gradient collocation objective with dense goal shaping and periodic synchronization with true rollout gradients. Benchmarks on Push-T tasks show GRASP outperforms CEM, vanilla gradient descent, and LatCo at horizons H=50–80, achieving higher success rates and faster planning times.

Table of contents
What is a world model?Planning: choosing actions by optimizing through the modelWhy long-horizon planning is hard (even when everything is differentiable)A long-horizon fix: lifting the dynamics constraintAn issue for deep learning-based world models: sensitivity of state-input gradientsGRASP: Gradient RelAxed S tochastic P lannerIngredient 1: Exploration by noising the state iteratesIngredient 2: Reshape gradients: stop brittle state-input gradients, keep action gradientsPeriodic “sync”: briefly return to true rollout gradientsHow GRASP addresses long-range planningWhat’s next?CitationSort: