OpenAI launched three new audio API models for developers: GPT-Realtime-2 (handles complex requests, tool calls, interruptions, and long voice sessions), GPT-Realtime-Translate (supports translation from 70+ languages into 13 output languages), and GPT-Realtime-Whisper (live speech-to-text for captions and meeting notes). The models are available in OpenAI's developer playground. Pricing starts at $32 per million audio input tokens for GPT-Realtime-2, $0.034/minute for translation, and $0.017/minute for Whisper. Early customers include Zillow, Priceline, and Deutsche Telekom.
Table of contents
Read: It’s official: ads are coming to ChatGPTSort: