Physical Intelligence presents their vision language action (VLA) models that enable robots to perform complex, dexterous tasks in real-world environments. Unlike traditional robotics limited to structured factory settings, their approach combines multimodal AI with robotics to create models that can generalize across different robots and environments. They've developed a comprehensive data collection pipeline using human teleoperation to gather training data, scaling from 3,800 hours to over 10,000 hours of successful robot demonstrations. Their latest model, PI-0.5, can perform long-horizon tasks up to 10 minutes in unseen homes, demonstrating true generalization capabilities. The company emphasizes that software intelligence, rather than hardware, is the main bottleneck in robotics scaling.

18m watch time

Sort: