2-D psychoacoustic modeling for automatic speech recognition in noisy environment

Autor:	Ketan J. Raut, Sampreeta Desai, Prasad D. Khandekar
Rok vydání:	2016
Předmět:	Engineering Voice activity detection Masking threshold business.industry Speech recognition Acoustic model Mel-frequency cepstrum Intelligibility (communication) business Filter bank Environmental noise Speech processing
Zdroj:	2016 Conference on Advances in Signal Processing (CASP).
DOI:	10.1109/casp.2016.7746151
Popis:	Powerful automatic speech recognition system (ASR)is matter of commercial importance as many leading companies are sprinting at industry and consumer level production. One of the major reasons for speech quality to hamper is environmental noise. Speech gets obscured by the loud background sound. This adversely affects the performance of automatic speech recognition system. We also know that human auditory system is comparatively more capable of managing noise than the machine. So as to improve the performance of ASR, auditory properties of human system is studied and modeled with the help of psychoacoustic filter. The filter is labeled as 2D P-filter as its parameter has values zero or positive. Also to remove noise, masking effect is implemented where the sounds falling under predetermined masking threshold are modified. Therefore the enhanced set of features are extracted by applying this filter to the Mel filter bank. The novelty of the paper is use of different distance metrics for classification and testing the performance of Automatic speech recognition system. Experiments are carried out on database of recording of rhyming words by articulatory disabled children in a studio. Expected results obtained after testing phase for noisy speech signals would be considerably improved.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::695594578c180717f868fe57ad2b03f7 https://doi.org/10.1109/casp.2016.7746151 Zobrazit plný text záznamu