Best of Audio Processing — 2025

  1. 1
    Article
    Avatar of hnHacker News·1y

    VERT.sh

    VERT.sh allows you to quickly convert various image, video, and audio files directly on your device with no file size limit, no ads, and all processing done locally. It supports a wide range of file formats and is fully open source. Video conversion can be set up locally by following the guide provided.

  2. 2
    Article
    Avatar of jeffgeerlingJeff Geerling·1y

    Raspberry Pi cluster spotted inside $6k audio processor

    The Orban Optimod 5000-series audio processors, which cost between $6,000-15,000, include a 3-node Raspberry Pi cluster. Each node in the cluster serves a different function: one for remote control and firmware updates, another for multi-stream audio processing, and an optional third for watermarking audio streams. This setup is popular among broadcasters for its power efficiency and long-term vendor support using Pi CM4/CM5 modules.

  3. 3
    Article
    Avatar of hnHacker News·43w

    KittenML/KittenTTS: State-of-the-art TTS model under 25MB 😻

    KittenTTS is an ultra-lightweight open-source text-to-speech model with only 15 million parameters and under 25MB size. It runs on CPU without GPU requirements, offers multiple voice options, and is optimized for real-time speech synthesis. The model is currently in developer preview with plans for full release, mobile SDK, and web version.

  4. 4
    Video
    Avatar of wawasenseiWawa Sensei·49w

    Real-Time Lipsync for Web: Build AI Chatbots & Games with Wawa-Lipsync (Free & Open-Source!)

    Wawa-Lipsync is a new open-source JavaScript library that enables real-time lip synchronization for web applications. Unlike existing solutions that are either expensive or slow, this library analyzes audio frequencies in real-time to generate visemes (visual representations of phonemes) that can animate 2D or 3D characters. The library works by using the browser's analyzer node to detect audio patterns and deduce mouth movements, making it suitable for AI chatbots, games, and interactive web experiences without server-side processing delays.

  5. 5
    Article
    Avatar of hnHacker News·1y

    pipecat-ai/smart-turn

    Pipecat-AI's smart-turn is an open-source, community-driven audio turn detection model designed to improve the functionality of conversational voice AI systems. It uses Meta AI's Wav2Vec2-BERT as its backbone and aims to closely mimic human speech patterns beyond traditional voice activity detection. The model is still in its initial phases, currently supporting English with limited training data. Future goals include multi-language support, faster inference times, and broader dataset inclusivity. Contributions and experimentation from the community are encouraged.

  6. 6
    Article
    Avatar of deepgramDeepgram·46w

    How to Build a Speech-to-Text (STT) Note Taking App in Python

    A comprehensive guide to building a speech-to-text note-taking application using Python, Deepgram's API, and LLMs. The tutorial covers audio recording with pyaudio, transcription with speaker diarization and timestamps, and intelligent post-processing using structured outputs from Google's Gemini API to generate summaries, chapters, and action items. Includes complete code examples and discusses extensions like UI integration and Obsidian packaging.

  7. 7
    Video
    Avatar of bytegradByteGrad·44w

    Build A Sick AI-Voice Memo App W/ Next.js + OpenAI Whisper (Background Jobs / Cron Job / Inngest)

    A comprehensive tutorial on building an AI voice memo application using Next.js, OpenAI Whisper for transcription, and Inngest for background job processing. The app records audio, transcribes it to text, extracts tasks and deadlines using GPT-4, and stores everything in a database. Key focus on solving common Next.js limitations around background processing, queuing systems, and cron jobs through Inngest's event-driven architecture with built-in observability and retry mechanisms.

  8. 8
    Video
    Avatar of laraveldailyLaravel Daily·23w

    Laravel AI SDK: First "Teaser" by Taylor Otwell #laravel

    Taylor Otwell teased a new Laravel AI SDK coming in 2026, showing a screenshot with audio generation and transcription capabilities. The SDK will be previewed at an event in San Francisco on January 14th, with potential release alongside Laravel 13 in March 2026. The teaser demonstrates methods for generating audio from text and transcribing audio back to text.