Ichigo, previously known as llama3-s, is a custom-built early-fusion speech model with improved multiturn capabilities and the ability to refuse inaudible queries. This model was rebranded and continues to evolve with cleaner data and enhanced functionality. It leverages techniques inspired by Meta's Chameleon paper and incorporates noise-synthetic data for better user experience. The project is open for collaboration and aims to advance text-based LLMs to have native listening capabilities.

4m read timeFrom github.com
Post cover image
Table of contents
AboutProgressJoin UsDemoReferencesAcknowledgement

Sort: