Predicting error rates for unknown data in automatic speech recognition

Autor:	Hynek Hermansky, Sri Harish Mallidi, Hendrik Kayser, Bernd Meyer
Rok vydání:	2017
Předmět:	Training set Artificial neural network Noise measurement Computer science business.industry Mean squared prediction error Speech recognition Matched filter Word error rate Pattern recognition 01 natural sciences 03 medical and health sciences 0302 clinical medicine Signal-to-noise ratio 0103 physical sciences Entropy (information theory) Artificial intelligence business 010301 acoustics 030217 neurology & neurosurgery
Zdroj:	ICASSP
DOI:	10.1109/icassp.2017.7953174
Popis:	In this paper we investigate methods to predict word error rates in automatic speech recognition in the presence of unknown noise types, which have not been seen during training. The performance measures operate on phoneme posteriorgrams that are obtained from neural nets. We compare average frame-wise entropy as a baseline measure to the mean temporal distance (M-Measure) and to the number of phonetic events. The latter is obtained by learning typical phoneme activations from clean training data, which are later applied as phoneme-specific matched filters to posteriorgrams (MaP). When exceeding a threshold after filtering, we register this as phonetic event. For test sets using 10 unknown noise types and a wide range of signal-to-noise ratios, we find M-Measure and MaP to produce predictions twice as accurate as the baseline measure. When excluding noise types that contain speech segments, a prediction error of 3.1% is achieved, compared to 15.0% for the baseline measure.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::0c4835198b413ace7454a4c9d66de8ce https://doi.org/10.1109/icassp.2017.7953174 Zobrazit plný text záznamu