StreamVoice is a novel streaming language model-based method for zero-shot voice conversion. It achieves real-time conversion, eliminates the need for complete source speech, and exhibits high speaker similarity. The conversion process has a latency of 124 ms.
Sort: