GRASP is a new gradient-based planner designed to make long-horizon planning with learned world models more practical and robust. The core problems addressed are: exploding/vanishing gradients from backpropagation through time, non-greedy local minima at longer horizons, and adversarial robustness issues when optimizing through high-dimensional deep learning models. GRASP uses a collocation (lifted-state) formulation that parallelizes optimization across time, injects Gaussian noise into virtual state updates for exploration, and avoids brittle state-input gradients by using only action Jacobians. A periodic refinement phase keeps trajectories grounded. Benchmarks on Push-T tasks show GRASP achieves higher success rates and faster planning times than CEM, standard gradient descent, and LatCo across horizons from H=40 to H=80.

12m read timeFrom robohub.org
Post cover image
Table of contents
What is a world model?Planning: choosing actions by optimizing through the modelWhy long-horizon planning is hard (even when everything is differentiable)A long-horizon fix: lifting the dynamics constraintAn issue for deep learning-based world models: sensitivity of state-input gradientsGRASP: Gradient RelAxed S tochastic P lannerIngredient 1: Exploration by noising the state iteratesIngredient 2: Reshape gradients: stop brittle state-input gradients, keep action gradientsPeriodic “sync”: briefly return to true rollout gradientsHow GRASP addresses long-range planningWhat’s next?Citation

Sort: