Augmenting standard speech recognition features with energy gravity centres

Autor:	Roberto Gemello, L. Moisa, Dario Albesano, Franco Mana, R. De Mori
Rok vydání:	2001
Předmět:	Artificial neural network Computer science Computation Speech recognition Theoretical Computer Science Human-Computer Interaction Nonlinear system Amplitude Formant Computer Science::Sound Mel-frequency cepstrum Hidden Markov model Software Coding (social sciences)
Zdroj:	Computer Speech & Language. 15:341-354
ISSN:	0885-2308
DOI:	10.1006/csla.2001.0171
Popis:	This paper describes an investigation on the possibility of adding new features to classical Mel Scaled Cepstral Coefficients (MFCC) and their time derivatives. A hybrid Automatic Speech Recognition (ASR) system is used based on a Neural Network (NN) and a collection of Hidden Markov Models (HMM). It is shown that the gravity centres (GC) of energies in the frequency bands of the first three formants and their first and second time derivatives can be added to the classical set of MFCCs and their first and second time derivatives, resulting in significant performance improvements. Nevertheless, in some cases, the added parameters may nave a negative effect on performance, because the parameters are reliable only for certain types of sounds as their values may exhibit large variations for the same sound in the presence of additive noise. Experiments have shown that one solution is that of introducing a reliability index indicating the importance the newly added parameters should have in describing a given frame. NNs appear to be suitable devices for taking this fact into account in the computation of observation probabilities. Experiments have also shown improvements when GCs are computed from zero-crossing intervals detected at the output of the filters of an ear model. Intensities are obtained by associating a nonlinear peak amplitude coding to each zero-crossing interval. Consistent improvements are observed when the above-mentioned solutions are applied with medium as well as large size lexicons in the presence of additive noise.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::70b0c7c9c77f732f23c8b0143a08be3c https://doi.org/10.1006/csla.2001.0171 Zobrazit plný text záznamu Full Text from ScienceDirect