Acoustic models of the elderly for large-vocabulary continuous speech recognition

Autor: Akira Baba, Akinobu Lee, Shinichi Yoshizawa, Miichi Yamada, Kiyohiro Shikano
Rok vydání: 2004
Předmět:
Zdroj: Electronics and Communications in Japan (Part II: Electronics). 87:49-57
ISSN: 1520-6432
8756-663X
DOI: 10.1002/ecjb.20101
Popis: Widespread use of large-vocabulary continuous speech recognition systems has recently occurred, encouraging the application of speech recognition techniques to various problems. One of the factors that adversely affect the performance of speech recognition systems is a mismatch between the acoustic properties of the speech of the system user and the acoustic model. The speech of young or middle-aged adults is generally used in constructing the acoustic model. Thus, a mismatch occurs between the model and the acoustic properties of the speech of the elderly, which may degrade the recognition rate. In this study, a large-scale elderly speech database (200 sentences ×301 subjects) is used to train the acoustic model, and the resulting elderly acoustic model is evaluated by using a large-vocabulary continuous speech recognition system. In the experiments, the word recognition rate was improved by 3 to 5% compared to the recognition results of an acoustic model trained by young or middle-aged adult speech, namely, by the JNAS speech database (150 sentences ×260 subjects, average 28.6 years). It is also verified experimentally that the recognition rate is further improved in speaker adaptation to elderly speech by making use of an acoustic model trained by elderly speech. © 2004 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 87(7): 49–57, 2004; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.20101
Databáze: OpenAIRE