Optimizing Integrated Features for Hindi Automatic Speech Recognition System

Autor:	Dua Mohit, Aggarwal Rajesh Kumar, Biswas Mantosh
Jazyk:	angličtina
Rok vydání:	2018
Předmět:	automatic speech recognition mfcc gfcc pso plp c-pso q-pso hmm Science Electronic computers. Computer science QA75.5-76.95
Zdroj:	Journal of Intelligent Systems, Vol 29, Iss 1, Pp 959-976 (2018)
Druh dokumentu:	article
ISSN:	0334-1860 2191-026X
DOI:	10.1515/jisys-2018-0057
Popis:	An automatic speech recognition (ASR) system translates spoken words or utterances (isolated, connected, continuous, and spontaneous) into text format. State-of-the-art ASR systems mainly use Mel frequency (MF) cepstral coefficient (MFCC), perceptual linear prediction (PLP), and Gammatone frequency (GF) cepstral coefficient (GFCC) for extracting features in the training phase of the ASR system. Initially, the paper proposes a sequential combination of all three feature extraction methods, taking two at a time. Six combinations, MF-PLP, PLP-MFCC, MF-GFCC, GF-MFCC, GF-PLP, and PLP-GFCC, are used, and the accuracy of the proposed system using all these combinations was tested. The results show that the GF-MFCC and MF-GFCC integrations outperform all other proposed integrations. Further, these two feature vector integrations are optimized using three different optimization methods, particle swarm optimization (PSO), PSO with crossover, and PSO with quadratic crossover (Q-PSO). The results demonstrate that the Q-PSO-optimized GF-MFCC integration show significant improvement over all other optimized combinations.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/ac4be8d198e9475d8b0605824e364c9b Zobrazit plný text záznamu View record in DOAJ