Automatic selection of phonetically distributed sentence sets for speaker adaptation with application to large vocabulary Mandarin speech recognition

Autor: Hsin-Min Wang, Jia-Lin Shen, Lin-Shan Lee, Ren-Yuan Lyu
Rok vydání: 1999
Předmět:
Zdroj: Computer Speech & Language. 13:79-97
ISSN: 0885-2308
DOI: 10.1006/csla.1998.0112
Popis: This paper presents an approach of automatic selection of phonetically distributed sentence sets for speaker adaptation, and applies the concept to the task of Mandarin speech recognition with very large vocabulary. This is a different approach to the adaptation data selection problem. A computer algorithm is developed to select minimum sets of phonetically distributed training sentences from a text corpus defining the desired task. These sentence sets not only include an almost minimum number of words and sentences that cover the desired acoustic units, but also have statistical distributions of these acoustic phonetic units very close to that in the given text corpus defining the desired task. In this way, more frequently used units can be better trained with higher accuracy, thus improving the overall performance, but the new user needs to produce only a small number of meaningful sentences to train the recognizer. Different sets of sentences selected using different phonetic criteria taking into consideration the statistics of the different acoustic units in the given corpus can then be integrated into a multi-stage adaptation procedure. With this procedure, the recognition performance can be improved incrementally stage by stage using the adaptation data produced with these sentence sets. This proposed approach is applied to an example task of Mandarin speech recognition with a very large vocabulary, both in isolated syllable and continuous speech modes and includes different subject domains in continuous speech recognition. Although the primary results obtained in this paper are for this example task, it is believed that many of the concepts and techniques developed here will also be very useful for other speaker adaptation problems and other languages.
Databáze: OpenAIRE