Robust representation and efficient matching of spoken word templates
Autor: | Anupam Mandal, K. R. Prasanna Kumar, Pabitra Mitra |
---|---|
Rok vydání: | 2017 |
Předmět: |
Dynamic time warping
Noise measurement business.industry Computer science 020206 networking & telecommunications Pattern recognition 02 engineering and technology 01 natural sciences Locality-sensitive hashing Time–frequency analysis ComputingMethodologies_PATTERNRECOGNITION Template Robustness (computer science) 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Spectrogram Artificial intelligence Mel-frequency cepstrum business 010301 acoustics |
Zdroj: | ICAPR |
Popis: | Two important considerations in template based speech recognition are i) achieving robustness of the templates to speaker and environmental variations, and ii) fast recognition of templates. In this paper, a template representation is proposed based on spectral peaks of the mel-scale magnitude spectrogram of a spoken word. The recognition is performed by finding k- closest matching representative templates in an index built using locality sensitive hashing (LSH). The robustness of the proposed template representation is studied in terms of recognition accuracy on clean and noisy isolated spoken words. The efficiency of matching is measured in terms of the average time taken for matching a test template. The results are compared with those of templates based on Mel frequency Cepstral coefficients (MFCC) with dynamic time warping (DTW) for matching. It is found that under clean conditions our proposed approach shows an improvement of 2.4% in case of 1-best recognition performance and 8.7 times less time for a match when all available training templates are used. Our proposed representation also shows a smoother fall in recognition performance compared to that of MFCC based representation indicating the better noise robustness of the former. Further, it is demonstrated that including both clean and noisy representative templates in the LSH index results in almost 3 times higher 1-best recognition accuracy under noisy conditions than that of MFCC based representation while the time taken for a single match is still lower by 5.8 times. The paper also studies the recognition performance of the proposed method by varying number of training templates. |
Databáze: | OpenAIRE |
Externí odkaz: |