Google DeepMind has released Gemini Robotics-ER 1.6, an upgraded embodied reasoning model for robotics applications. Key improvements include enhanced spatial reasoning (pointing, counting, object detection), better multi-view understanding across multiple camera streams, and a new instrument reading capability developed with Boston Dynamics that lets robots accurately read analog gauges and sight glasses in industrial facilities. The model uses agentic vision — combining visual reasoning with code execution — to zoom into images, use pointing, and execute code for precise readings. Safety is also improved, with better compliance on adversarial spatial tasks and physical constraint adherence. The model is available via the Gemini API and Google AI Studio, with a Colab notebook for developers to get started.
Sort: