This AI Paper from China Introduces StreamVoice: A Novel Language Model-Based Zero-Shot Voice Conversion System Designed for Streaming Scenarios

1 min read


● Recent advances in language models showcase impressive zero-shot voice conversion (VC) capabilities
● StreamVoice is a novel streaming language model-based method for zero-shot voice conversion, allowing real-time conversion with any speaker prompts and source speech
● StreamVoice achieves streaming capability by employing a fully causal context-aware LM with a temporal-independent acoustic predictor
● It uses teacher-guided context foresight and semantic masking strategy to mitigate potential performance degradation in streaming processing
● StreamVoice incurs only 124 ms latency for the conversion process, making it 2.4 times faster than real-time on a single A100 GPU

Author: Janhavi Lande
Source: link

Latest from Blog

withemes on instagram