Combining evidences from excitation source and vocal tract system features for Indian language identification using deep neural networks

Autor:	Ravi Kumar Vuddagiri, Anil Kumar Vuppala, Mounika Kamsali Veera, Suryakanth V. Gangashetty
Rok vydání:	2017
Předmět:	Linguistics and Language Artificial neural network Language identification Computer science Speech recognition Word error rate 020206 networking & telecommunications Linear prediction 02 engineering and technology Residual Language and Linguistics Human-Computer Interaction 030507 speech-language pathology & audiology 03 medical and health sciences 0202 electrical engineering electronic engineering information engineering Computer Vision and Pattern Recognition Mel-frequency cepstrum 0305 other medical science Software Vocal tract Fusion mechanism
Zdroj:	International Journal of Speech Technology. 21:501-508
ISSN:	1572-8110 1381-2416
Popis:	In this paper, a combination of excitation source information and vocal tract system information is explored for the task of language identification (LID). The excitation source information is represented by features extracted from linear prediction (LP) residual signal called the residual cepstral coefficients (RCC). Vocal tract system information is represented by the mel frequency cepstral coefficients (MFCC). In order to incorporate additional temporal information, shifted delta cepstra (SDC) are computed. An LID system is built using SDC over both MFCC and RCC features individually and evaluated based on their equal error rate (EER). Experiments have been performed on a dataset consisting of 13 Indian languages with about 115 h for training and 30 h for testing using a deep neural network (DNN), DNN with attention (DNN-WA) and a state-of-the-art i-vector system. DNN-WA outperforms the baseline i-vector system. An EER of 9.93 and 6.25% are achieved using RCC and MFCC features respectively. By combining evidence from both features using a late fusion mechanism, an EER of 5.76% is obtained. This result indicates the complementary nature of the excitation source information to that of the widely used vocal tract system information for the task of LID.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::4f795a460ef2df496551356e943b52b2 https://doi.org/10.1007/s10772-017-9481-6 Zobrazit plný text záznamu Full text from SpringerLink