Streaming ASR Encoder for Whisper-to-Speech Online Voice Conversion

Autor: Anastasia Avdeeva, Aleksei Gusev, Tseren Andzhukaev, Artem Ivanov
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: IEEE Open Journal of Signal Processing, Vol 5, Pp 160-167 (2024)
Druh dokumentu: article
ISSN: 2644-1322
DOI: 10.1109/OJSP.2023.3343342
Popis: Whispered speech is a quiet voice without vocalization. One of the common cases of using whispered speech is a technique that can help overcome stuttering. But whispered speech can be uncomfortable and difficult to understand in everyday communication. To address these problems, we propose a method of low-delayed whisper-to-speech voice conversion, which can be useful in real life communication of people with disordered speech. As part of our research, we study the impact of streaming Automatic Speech Recognition models on the quality of voice conversion, comparing different streaming models and methods for model adaptation to streaming settings, and showing the importance of using such models in cases of low-delayed voice conversion.
Databáze: Directory of Open Access Journals