OpenAI's new speech model, gpt-realtime, hopes that its more naturalistic voices would make enterprises use more AI generated voices in applications.

VentureBeat is a leading source of news, analysis, and insights on technology innovation, startups, and venture capital. Covering topics such as AI, blockchain, gaming, and more, VentureBeat provides  reporting, interviews, and commentary on trends and developments shaping the tech industry. Entrepreneurs, investors, and technology enthusiasts can stay informed about the latest news, funding rounds, and market trends through VentureBeat's coverage.

Venture Beat

OpenAI launches gpt-realtime, a new speech-to-speech AI model designed for enterprise applications with more natural, expressive voices and improved instruction-following capabilities. The model operates through the newly available Realtime API, featuring enhanced function calling, image recognition, and SIP support for contact center use cases. Despite competition from ElevenLabs, Soundhound, and others in the crowded voice AI market, OpenAI positions its solution with better benchmarking scores and reduced pricing at $32 per million audio input tokens.

In crowded voice AI market, OpenAI bets on instruction-following and expressive speech to win enterprise adoption