Best of Speech RecognitionJuly 2025

  1. 1
    Article
    Avatar of deepgramDeepgram·42w

    Announcing Deepgram Saga: The Voice OS for Developers

    Deepgram launches Saga, a Voice OS that allows developers to control their entire development workflow through natural speech commands. Saga integrates with existing tools like Cursor, MCP, and Slack, enabling developers to execute tasks across their tech stack without context switching. The platform can transform rough ideas into precise prompts, generate code from plain speech, manage end-to-end workflows, and structure thoughts into documentation. Unlike traditional voice assistants, Saga embeds directly into developer workflows rather than operating as a separate interface.

  2. 2
    Video
    Avatar of samwitteveenaiSam Witteveen·43w

    Kyutai STT & TTS - A Perfect Local Voice Solution?

    Kyutai has released separate speech-to-text and text-to-speech models that offer low latency voice processing for English and French. The TTS model is only 1.6B parameters and performs competitively with commercial solutions like 11 Labs. While the models support voice cloning through embeddings, the voice embedding model itself isn't released for ethical reasons. Users can blend existing voice embeddings to create new voices, but cannot generate embeddings from custom audio samples. The models show promise for local voice applications but are currently limited by language support and the restricted voice cloning capability.

  3. 3
    Article
    Avatar of deepgramDeepgram·42w

    How to Build a Speech-to-Text (STT) Note Taking App in Python

    A comprehensive guide to building a speech-to-text note-taking application using Python, Deepgram's API, and LLMs. The tutorial covers audio recording with pyaudio, transcription with speaker diarization and timestamps, and intelligent post-processing using structured outputs from Google's Gemini API to generate summaries, chapters, and action items. Includes complete code examples and discusses extensions like UI integration and Obsidian packaging.