Gradium, born from the open audio lab Kyutai, demonstrates how small teams of specialized researchers are outperforming major AI labs in audio AI. The team built Moshi, the first full-duplex conversational AI model with 160ms latency, using only 4 researchers in 6 months. Their success stems from deep domain expertise in audio
Table of contents
A brief history of audio ML, and why it’s consistently overlookedDynamics of big labs and why small teams of researchers can outperformAnatomy of training an audio modelAudio model architectures: speech-to-speech vs. full duplexWhy small teams win at audioSort: