An Investigation of LSTM-CTC based Joint Acoustic Model for Indian Language Identification

Autor:	Ravi Kumar Vuddagiri, Anil Kumar Vuppala, Hari Krishna Vydana, Tirusha Mandava
Rok vydání:	2019
Předmět:	Identification (information) Indian English End-to-end principle Computer science Speech recognition Cepstrum Feature (machine learning) language Word error rate Acoustic model Joint (audio engineering) language.human_language
Zdroj:	ASRU
DOI:	10.1109/asru46091.2019.9003784
Popis:	In this paper, phonetic features derived from the joint acoustic model (JAM) of a multilingual end to end automatic speech recognition system are proposed for Indian language identification (LID). These features utilize contextual information learned by the JAM through long short-term memory-connectionist temporal classification (LSTM-CTC) framework. Hence, these features are referred to as CTC features. A multi-head self-attention network is trained using these features, which aggregates the frame-level features by selecting prominent frames through a parametrized attention layer. The proposed features have been tested on IIITH-ILSC database that consists of 22 official Indian languages and Indian English. Experimental results demonstrate that CTC features outperformed i-vector and phonetic temporal neural LID systems and produced an 8.70% equal error rate. The fusion of shifted delta cepstral and CTC feature-based LID systems at the model level and feature level further improved the performance.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::b11a327e6cbb4abfa19011a44d0dc0ad https://doi.org/10.1109/asru46091.2019.9003784 Zobrazit plný text záznamu