LLaMA-Omni is a high-quality, low-latency, end-to-end speech interaction model built on Llama-3.1-8B-Instruct. It can generate both text and speech responses with latency as low as 226ms. The model was trained in less than 3 days using 4 GPUs. Setup involves cloning the repository, installing necessary packages, and downloading models from Huggingface and other sources. A Gradio web server can be used for interaction.
Table of contents
💡 HighlightsInstallQuick StartGradio DemoLocal InferenceLICENSEAcknowledgementsCitationStar History1 Comment
Sort: