LLaMA-Omni, developed by researchers from the University of Chinese Academy of Sciences, is a novel AI model architecture designed for low-latency, high-quality speech interaction with large language models (LLMs). It integrates a speech encoder, speech adaptor, LLM, and streaming speech decoder to enable seamless speech-to-speech communication, bypassing intermediate text transcription. The model’s innovative design and the specialized InstructS2S-200K dataset allow it to outperform previous models in both content and style, achieving a remarkably low response latency of 226ms. Its efficient training process makes it a leading solution for real-time speech-based interactions.

5m read timeFrom marktechpost.com
Post cover image

Sort: