ELM speaker identification for limited dataset using multitaper based MFCC and PNCC features with fusion score
Autor: | Bharath K P, Rajesh Kumar M |
---|---|
Rok vydání: | 2020 |
Předmět: |
Normalization (statistics)
Computer Networks and Communications Computer science Speech recognition 020207 software engineering TIMIT 02 engineering and technology Speaker recognition Speech processing ComputingMethodologies_PATTERNRECOGNITION Hardware and Architecture Multitaper Cepstral Mean and Variance Normalization 0202 electrical engineering electronic engineering information engineering Media Technology Feedforward neural network Mel-frequency cepstrum Software Extreme learning machine |
Zdroj: | Multimedia Tools and Applications. 79:28859-28883 |
ISSN: | 1573-7721 1380-7501 |
DOI: | 10.1007/s11042-020-09353-z |
Popis: | In current scenario, speaker recognition under noisy condition is the major challenging task in the area of speech processing. Due to noise environment there is a significant degradation in the system performance. The major aim of the proposed work is to identify the speaker’s under clean and noise background using limited dataset. In this paper, we proposed a multitaper based Mel frequency cepstral coefficients (MFCC) and power normalization cepstral coefficients (PNCC) techniques with fusion strategies. Here, we used MFCC and PNCC techniques with different multitapers to extract the desired features from the obtained speech samples. Then, cepstral mean and variance normalization (CMVN) and Feature warping (FW) are the two techniques applied to normalize the obtained features from both the techniques. Furthermore, as a system model low dimension i-vector model is used and also different fusion score strategies like mean, maximum, weighted sum, cumulative and concatenated fusion techniques are utilized. Finally extreme learning machine (ELM) is used for classification in order to increase the system identification accuracy (SIA) intern which is having a single layer feedforward neural network with less complexity and time consuming compared to other neural networks. TIMIT and SITW 2016 are the two different databases are used to evaluate the proposed system under limited data of these databases. Both clean and noisy backgrounds conditions are used to check the SIA. |
Databáze: | OpenAIRE |
Externí odkaz: |