Towards Data Science is a community-powered publication that showcases work in data science, machine learning and artificial intelligence. Every day newcomers, seasoned researchers and industry practitioners publish tutorials, research notes and real-world case studies that help the field move forward.

Towards Data Science

A practical guide to using Gemini's spatial understanding capabilities for open-vocabulary object detection and image editing. The tutorial covers detecting visual objects in photos of books, magazines, and electronics using natural language prompts (no model training required), extracting bounding boxes and metadata via structured Pydantic outputs, and then using Gemini's image generation models (Nano Banana) to restore, straighten, and colorize the detected objects. Includes full Python code using the Google Gen AI SDK, with examples ranging from 15th-century woodcuts to modern circuit boards.

Detecting and Editing Visual Objects with Gemini