Combining evidences from excitation source and vocal tract system features for Indian language identification using deep neural networks

Autor: Ravi Kumar Vuddagiri, Anil Kumar Vuppala, Mounika Kamsali Veera, Suryakanth V. Gangashetty
Rok vydání: 2017
Předmět:
Zdroj: International Journal of Speech Technology. 21:501-508
ISSN: 1572-8110
1381-2416
Popis: In this paper, a combination of excitation source information and vocal tract system information is explored for the task of language identification (LID). The excitation source information is represented by features extracted from linear prediction (LP) residual signal called the residual cepstral coefficients (RCC). Vocal tract system information is represented by the mel frequency cepstral coefficients (MFCC). In order to incorporate additional temporal information, shifted delta cepstra (SDC) are computed. An LID system is built using SDC over both MFCC and RCC features individually and evaluated based on their equal error rate (EER). Experiments have been performed on a dataset consisting of 13 Indian languages with about 115 h for training and 30 h for testing using a deep neural network (DNN), DNN with attention (DNN-WA) and a state-of-the-art i-vector system. DNN-WA outperforms the baseline i-vector system. An EER of 9.93 and 6.25% are achieved using RCC and MFCC features respectively. By combining evidence from both features using a late fusion mechanism, an EER of 5.76% is obtained. This result indicates the complementary nature of the excitation source information to that of the widely used vocal tract system information for the task of LID.
Databáze: OpenAIRE