A walkthrough of integrating Gemini Live API with Twilio for phone-based voice AI agents, deployed on Google Cloud Run. The setup uses the Google Gen AI Python SDK with a FastAPI server that proxies WebSocket connections between Twilio media streams and the Gemini Live API. Key technical details include audio format conversion (Gemini outputs 24kHz 16-bit PCM while Twilio expects 8kHz mu-law), inbound/outbound call handling via TwiML, and Secret Manager for API key management. The same server also supports browser-based interaction with camera sharing. Partner integrations like LiveKit, Pipe Chat, and Agora are mentioned as alternatives for those who want to avoid handling audio conversion manually.
•9m watch time
Sort: