Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
Autor: | Tomoki Toda, Hiroshi Saruwatari, Hironori Doi, Keigo Nakamura, Kiyohiro Shikano |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2010 |
Předmět: |
Audio signal
Voice activity detection voice conversion Computer science Speech recognition laryngectomees Esophageal speech PSQM Intelligibility (communication) Speech processing Linear predictive coding Voice analysis Speech enhancement esophageal speech Artificial Intelligence Hardware and Architecture eigenvoice conversion speech enhancement Computer Vision and Pattern Recognition Electrical and Electronic Engineering Sound quality Software |
Zdroj: | IEICE Transactions on Information and Systems. (9):2472-2481 |
ISSN: | 0916-8532 |
Popis: | This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices usually sound unnatural compared with normal speech. To improve the intelligibility and naturalness of esophageal speech, we propose a voice conversion method from esophageal speech into normal speech. A spectral parameter and excitation parameters of target normal speech are separately estimated from a spectral parameter of the esophageal speech based on Gaussian mixture models. The experimental results demonstrate that the proposed method yields significant improvements in intelligibility and naturalness. We also apply one-to-many eigenvoice conversion to esophageal speech enhancement to make it possible to flexibly control the voice quality of enhanced speech. |
Databáze: | OpenAIRE |
Externí odkaz: |