Robust representation and efficient matching of spoken word templates

Autor: Anupam Mandal, K. R. Prasanna Kumar, Pabitra Mitra
Rok vydání: 2017
Předmět:
Zdroj: ICAPR
Popis: Two important considerations in template based speech recognition are i) achieving robustness of the templates to speaker and environmental variations, and ii) fast recognition of templates. In this paper, a template representation is proposed based on spectral peaks of the mel-scale magnitude spectrogram of a spoken word. The recognition is performed by finding k- closest matching representative templates in an index built using locality sensitive hashing (LSH). The robustness of the proposed template representation is studied in terms of recognition accuracy on clean and noisy isolated spoken words. The efficiency of matching is measured in terms of the average time taken for matching a test template. The results are compared with those of templates based on Mel frequency Cepstral coefficients (MFCC) with dynamic time warping (DTW) for matching. It is found that under clean conditions our proposed approach shows an improvement of 2.4% in case of 1-best recognition performance and 8.7 times less time for a match when all available training templates are used. Our proposed representation also shows a smoother fall in recognition performance compared to that of MFCC based representation indicating the better noise robustness of the former. Further, it is demonstrated that including both clean and noisy representative templates in the LSH index results in almost 3 times higher 1-best recognition accuracy under noisy conditions than that of MFCC based representation while the time taken for a single match is still lower by 5.8 times. The paper also studies the recognition performance of the proposed method by varying number of training templates.
Databáze: OpenAIRE