Phonetic alignment: speech synthesis-based vs. Viterbi-based

Autor:	Thierry Dutoit, Olivier Deroo, Christophe Ris, Fabrice Malfrère
Rok vydání:	2003
Předmět:	Linguistics and Language Computer science Speech recognition Speech synthesis computer.software_genre Viterbi algorithm Language and Linguistics Speech segmentation symbols.namesake Segmentation Hidden Markov model Training set Artificial neural network business.industry Communication Process (computing) Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) Pattern recognition Computer Science Applications ComputingMethodologies_PATTERNRECOGNITION Computer Science::Sound Modeling and Simulation symbols Computer Vision and Pattern Recognition Artificial intelligence business computer Software
Zdroj:	Speech Communication. 40:503-515
ISSN:	0167-6393
Popis:	In this paper we compare two different methods for automatically phonetically labeling a continuous speech data-base, as usually required for designing a speech recognition or speech synthesis system. The first method is based on temporal alignment of speech on a synthetic speech pattern; the second method uses either a continuous density hidden Markov models (HMM) or a hybrid HMM/ANN (artificial neural network) system in forced alignment mode. Both systems have been evaluated on read utterances not part of the training set of the HMM systems, and compared to manual segmentation. This study outlines the advantages and drawbacks of both methods. The speech synthetic system has the great advantage that no training stage (hence no large labeled database) is needed, while HMM Systems easily handle multiple phonetic transcriptions (phonetic lattice). We deduce a method for the automatic creation of large phonetically labeled speech databases, based on using the synthetic speech segmentation tool to bootstrap the training process of either a HMM or a hybrid HMM/ANN system. The importance of such segmentation tools is a key point for the development of improved multilingual speech synthesis and recognition systems.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::0c9dc0806e68deeba996f84a9285a59b https://doi.org/10.1016/s0167-6393(02)00131-0 Zobrazit plný text záznamu Full Text from ScienceDirect