Predicting error rates for unknown data in automatic speech recognition

Autor: Hynek Hermansky, Sri Harish Mallidi, Hendrik Kayser, Bernd Meyer
Rok vydání: 2017
Předmět:
Zdroj: ICASSP
DOI: 10.1109/icassp.2017.7953174
Popis: In this paper we investigate methods to predict word error rates in automatic speech recognition in the presence of unknown noise types, which have not been seen during training. The performance measures operate on phoneme posteriorgrams that are obtained from neural nets. We compare average frame-wise entropy as a baseline measure to the mean temporal distance (M-Measure) and to the number of phonetic events. The latter is obtained by learning typical phoneme activations from clean training data, which are later applied as phoneme-specific matched filters to posteriorgrams (MaP). When exceeding a threshold after filtering, we register this as phonetic event. For test sets using 10 unknown noise types and a wide range of signal-to-noise ratios, we find M-Measure and MaP to produce predictions twice as accurate as the baseline measure. When excluding noise types that contain speech segments, a prediction error of 3.1% is achieved, compared to 15.0% for the baseline measure.
Databáze: OpenAIRE