Explores advanced task and motion planning (TAMP) systems that integrate perception with robot manipulation for long-horizon tasks. Covers OWL-TAMP, VLM-TAMP, and NOD-TAMP frameworks that use vision-language models to translate natural language instructions into robot actions. Introduces cuTAMP for GPU-accelerated planning that solves complex manipulation tasks in seconds, and Fail2Progress framework that enables robots to learn from failures using Stein variational inference. These approaches address traditional TAMP limitations by enabling robots to adapt to dynamic environments and handle multi-step tasks with 30-50 sequential actions.

Table of contents
How task and motion planning transforms vision and language into robot actionHow cuTAMP accelerates robot planning with GPU parallelizationHow robots learn from failures using Stein variational inferenceGetting startedAcknowledgmentsSort: