Augmenting standard speech recognition features with energy gravity centres

Autor: Roberto Gemello, L. Moisa, Dario Albesano, Franco Mana, R. De Mori
Rok vydání: 2001
Předmět:
Zdroj: Computer Speech & Language. 15:341-354
ISSN: 0885-2308
DOI: 10.1006/csla.2001.0171
Popis: This paper describes an investigation on the possibility of adding new features to classical Mel Scaled Cepstral Coefficients (MFCC) and their time derivatives. A hybrid Automatic Speech Recognition (ASR) system is used based on a Neural Network (NN) and a collection of Hidden Markov Models (HMM). It is shown that the gravity centres (GC) of energies in the frequency bands of the first three formants and their first and second time derivatives can be added to the classical set of MFCCs and their first and second time derivatives, resulting in significant performance improvements. Nevertheless, in some cases, the added parameters may nave a negative effect on performance, because the parameters are reliable only for certain types of sounds as their values may exhibit large variations for the same sound in the presence of additive noise. Experiments have shown that one solution is that of introducing a reliability index indicating the importance the newly added parameters should have in describing a given frame. NNs appear to be suitable devices for taking this fact into account in the computation of observation probabilities. Experiments have also shown improvements when GCs are computed from zero-crossing intervals detected at the output of the filters of an ear model. Intensities are obtained by associating a nonlinear peak amplitude coding to each zero-crossing interval. Consistent improvements are observed when the above-mentioned solutions are applied with medium as well as large size lexicons in the presence of additive noise.
Databáze: OpenAIRE