Best of Audio Processing — 2024

1
Article
Hacker News·2y
JoinMusic/fish: YouTube video to chords, lyrics, beat and melody.
An AI-powered multimodal project employs various transformer models to generate chords, beats, lyrics, melody, and tabs for any song from YouTube videos. The system includes models like U-Net for audio separation, Pitch-Net for melody tracking, Beat-Net for tempo tracking, and Chord-Net for chord recognition. It supports multiple languages and allows for editable sheet music creation. Utilizing a combination of STFT, MFCC, and chroma features, it ensures better generalization with minimal training data.
31
2
Article
Real Python·2y
Build a Guitar Synthesizer: Play Musical Tablature in Python – Real Python
Learn how to build a guitar synthesizer using Python by implementing the Karplus-Strong plucked string synthesis algorithm. This guide walks you through mimicking different string instruments and their tunings, combining multiple vibrating strings into polyphonic chords, and simulating realistic guitar playing techniques. You'll use tools like NumPy and Pedalboard while handling musical notes, scientific pitch notation, and guitar tablature. The project also covers setting up your development environment and managing dependencies with Poetry.
20
3
Article
Hacker News·2y
abus-aikorea/voice-pro: Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2,
Voice-Pro is a comprehensive Gradio WebUI for audio processing that supports transcription, translation, text-to-speech, and voice cloning. It features real-time and batch modes, a YouTube downloader, vocal remover, and support for over 100 languages. Voice-Pro requires a Windows system, NVIDIA GPU, and an internet connection for installation. It provides tabs for various functions including transcription, translation, TTS, and live translation. Installation is simplified with one-click setup scripts.
19
1
4
Article
Daily Dose of Data Science | Avi Chawla | Substack·2y
The Utility of Vector Databases in LLMs
Explore the surge in interest for vector databases, which are essential for storing and querying unstructured data. These databases are now crucial for enhancing the functionality of Large Language Models (LLMs) by embedding unstructured data into high-dimensional vectors. Learn how AssemblyAI's LeMUR framework simplifies building LLM apps on audio data, and how vector databases avoid the need for costly and complex fine-tuning of LLMs.
16
5
Article
GoPenAI·2y
OpenAI Text-to-Speech 📢: Bridging Language Barriers with Versatile Voice Solutions
OpenAI's Text-to-Speech (TTS) model offers versatile voice solutions, supporting various languages and audio formats. The setup involves creating a Python project, using OpenAI's API, and configuring voice and speech speed options. The model supports multiple voices, including 'alloy' and 'echo', and can convert text into speech in numerous languages, such as Spanish. Supported audio formats include MP3, AAC, FLAC, WAV, and PCM.
16
6
Article
Hacker News·2y
lifeiteng/OmniSenseVoice: Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
Omni SenseVoice, built on SenseVoice, offers high-speed and precise audio transcription with features like automatic language detection, GPU support, and quantized models for faster processing. Users can achieve up to 50x faster processing without sacrificing accuracy. The tool can be easily installed via pip and provides several key options for customization.
15
7
Article
iO tech_hub·2y
Scratching the web
Learn how to use the Web Audio API to create a realtime turntable effect that can load, play, reverse, and control the playback speed of an audio track. The post covers fundamental concepts, including record rotation, dragging for scratching effects, and tying audio playback to these actions, with code examples provided to illustrate the implementation.
12

See all Audio Processing archives