Autor: |
Bouchakour, Lallouani, Debyeche, Mohamed |
Předmět: |
|
Zdroj: |
International Journal of Speech Technology; Mar2022, Vol. 25 Issue 1, p269-277, 9p |
Abstrakt: |
The performance of Continuous Automatic Speech Recognition Systems (CASRS) in networks communications degrades rapidly in the presence of speech signal variability such as noisy environment, channel communication, and speech codec. There are several techniques proposed to improve recognition accuracy. The ASR consists of two main processing steps: feature extraction (Front-End) and classification (Back-End). We are motivated to develop speech separation algorithms (feature enhancement) to improve the intelligibility of noisy speech and the accuracy of ASR. We use non-negative matrix factorization and Ideal Binary Mask, which are estimated by a deep neural network (DNN) to use the Spectro-temporal structures of magnitude spectrograms for supervised speech separation. The ASR is based on the convolution neural network where the input is the Log Mel Cepstrum features. The system was trained using 440 sentences of 20 speakers encoded AMR-NB database and contaminated with various levels of signal-to-noise ratio (0 dB, 5 dB and 10 dB). [ABSTRACT FROM AUTHOR] |
Databáze: |
Complementary Index |
Externí odkaz: |
|