Phonetic alignment: speech synthesis-based vs. Viterbi-based
Autor: | Thierry Dutoit, Olivier Deroo, Christophe Ris, Fabrice Malfrère |
---|---|
Rok vydání: | 2003 |
Předmět: |
Linguistics and Language
Computer science Speech recognition Speech synthesis computer.software_genre Viterbi algorithm Language and Linguistics Speech segmentation symbols.namesake Segmentation Hidden Markov model Training set Artificial neural network business.industry Communication Process (computing) Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) Pattern recognition Computer Science Applications ComputingMethodologies_PATTERNRECOGNITION Computer Science::Sound Modeling and Simulation symbols Computer Vision and Pattern Recognition Artificial intelligence business computer Software |
Zdroj: | Speech Communication. 40:503-515 |
ISSN: | 0167-6393 |
Popis: | In this paper we compare two different methods for automatically phonetically labeling a continuous speech data-base, as usually required for designing a speech recognition or speech synthesis system. The first method is based on temporal alignment of speech on a synthetic speech pattern; the second method uses either a continuous density hidden Markov models (HMM) or a hybrid HMM/ANN (artificial neural network) system in forced alignment mode. Both systems have been evaluated on read utterances not part of the training set of the HMM systems, and compared to manual segmentation. This study outlines the advantages and drawbacks of both methods. The speech synthetic system has the great advantage that no training stage (hence no large labeled database) is needed, while HMM Systems easily handle multiple phonetic transcriptions (phonetic lattice). We deduce a method for the automatic creation of large phonetically labeled speech databases, based on using the synthetic speech segmentation tool to bootstrap the training process of either a HMM or a hybrid HMM/ANN system. The importance of such segmentation tools is a key point for the development of improved multilingual speech synthesis and recognition systems. |
Databáze: | OpenAIRE |
Externí odkaz: |