Abstract page for arXiv paper 2502.03382: High-Fidelity Simultaneous Speech-To-Speech Translation

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Hibiki is a decoder-only model that performs simultaneous speech-to-speech translation by processing source and target speech synchronously through a multistream language model. The system addresses the challenge of real-time translation by using a weakly-supervised method that leverages perplexity to identify optimal delays and create aligned synthetic data. Hibiki achieves state-of-the-art performance on French-English translation tasks while maintaining speaker fidelity and naturalness, with inference simple enough for batched translation and real-time on-device deployment.

[2502.03382] High-Fidelity Simultaneous Speech-To-Speech Translation