Optimizing Integrated Features for Hindi Automatic Speech Recognition System
Autor: | Dua Mohit, Aggarwal Rajesh Kumar, Biswas Mantosh |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: | |
Zdroj: | Journal of Intelligent Systems, Vol 29, Iss 1, Pp 959-976 (2018) |
Druh dokumentu: | article |
ISSN: | 0334-1860 2191-026X |
DOI: | 10.1515/jisys-2018-0057 |
Popis: | An automatic speech recognition (ASR) system translates spoken words or utterances (isolated, connected, continuous, and spontaneous) into text format. State-of-the-art ASR systems mainly use Mel frequency (MF) cepstral coefficient (MFCC), perceptual linear prediction (PLP), and Gammatone frequency (GF) cepstral coefficient (GFCC) for extracting features in the training phase of the ASR system. Initially, the paper proposes a sequential combination of all three feature extraction methods, taking two at a time. Six combinations, MF-PLP, PLP-MFCC, MF-GFCC, GF-MFCC, GF-PLP, and PLP-GFCC, are used, and the accuracy of the proposed system using all these combinations was tested. The results show that the GF-MFCC and MF-GFCC integrations outperform all other proposed integrations. Further, these two feature vector integrations are optimized using three different optimization methods, particle swarm optimization (PSO), PSO with crossover, and PSO with quadratic crossover (Q-PSO). The results demonstrate that the Q-PSO-optimized GF-MFCC integration show significant improvement over all other optimized combinations. |
Databáze: | Directory of Open Access Journals |
Externí odkaz: |