ALO-VC: Any-to-any Low-latency One-shot Voice Conversion

Autor: Wang, Bohan, Ronssin, Damien, Cernak, Milos
Rok vydání: 2023
Předmět:
Druh dokumentu: Working Paper
Popis: This paper presents ALO-VC, a non-parallel low-latency one-shot phonetic posteriorgrams (PPGs) based voice conversion method. ALO-VC enables any-to-any voice conversion using only one utterance from the target speaker, with only 47.5 ms future look-ahead. The proposed hybrid signal processing and machine learning pipeline combines a pre-trained speaker encoder, a pitch predictor to predict the converted speech's prosody, and positional encoding to convey the phoneme's location information. We introduce two system versions: ALO-VC-R, which uses a pre-trained d-vector speaker encoder, and ALO-VC-E, which improves performance using the ECAPA-TDNN speaker encoder. The experimental results demonstrate both ALO-VC-R and ALO-VC-E can achieve comparable performance to non-causal baseline systems on the VCTK dataset and two out-of-domain datasets. Furthermore, both proposed systems can be deployed on a single CPU core with 55 ms latency and 0.78 real-time factor. Our demo is available online.
Comment: Accepted to Interspeech 2023. Some audio samples are available at https://bohan7.github.io/ALO-VC-demo/
Databáze: arXiv