An Initial Study on Minimum Phone Error Discriminative Learning of Acoustic Models for Mandarin Large Vocabulary Continuous Speech Recognition

Autor: Jen-Wei Kuo, 郭人瑋
Rok vydání: 2005
Druh dokumentu: 學位論文 ; thesis
Popis: 93
Discriminative training of acoustic models has been an active focus of much current research in automatic speech recognition (ASR) in the past few years. This thesis extensively investigated the use of the Minimum Phone Error (MPE) approaches for discriminative training and adaptation of acoustic models for Mandarin large vocabulary continuous speech recognition (LVCSR). All experiments were carried out on the Mandarin broadcast news corpus (MATBN). The experimental results show that MPE training can give significant improvements over the baseline systems whose acoustic models were trained based on the Maximum Likelihood (ML), Maximum Mutual Information (MMI) principles. Comparing to the ML-trained acoustic models, relative reductions of 15.52% syllable error rate (SER), 12.33% character error rate (CER) and 10.02% word error rate (WER) were respectively obtained by using the MPE-trained models. Moreover, unsupervised adaptation of acoustic models via the MPE-trained linear transformation in either the model space or the feature space was studied as well with promising results indicated. However, because there was no correct reference transcript that can be used for accuracy calculation and only the top one automatic transcript can be used instead, the unsupervised MPE-based adaptation techniques may not always accumulate good estimates for the acoustic model parameters and thus their performance will be substantially degraded. To tackle this problem, in this thesis a novel Raw Accuracy Prediction Model (RAPM) was proposed to ameliorate the MPE-based adaptation techniques and slight performance gains were initially demonstrated.
Databáze: Networked Digital Library of Theses & Dissertations