A walkthrough of building a real-time multimodal voice agent using Pipecat and Google's Gemini 3 model. The demo shows a travel planning bot that handles multi-turn conversations, searches for flights and lodging, uses Google Search grounding, and saves trip reports. Key highlights include Gemini 3's improved instruction following, enhanced tool calling, and multimodal capabilities. The tutorial covers scaffolding with the Pipecat CLI, defining function call handlers, registering tools with the LLM, and running multiple agents (concierge and language tutor) within a single bot file using the Pipecat Agents module.

12m watch time

Sort: