Loss Functions for Deep Monaural Speech Enhancement
Autor: | Christopher Schymura, Steffen Zeiler, Lea Schönherr, Jan Freiwald, Dorothea Kolossa |
---|---|
Rok vydání: | 2020 |
Předmět: |
Machine listening
Minimum mean square error Artificial neural network Computer science Speech recognition media_common.quotation_subject 020206 networking & telecommunications 02 engineering and technology Monaural Speech enhancement 030507 speech-language pathology & audiology 03 medical and health sciences Perception 0202 electrical engineering electronic engineering information engineering Psychoacoustics 0305 other medical science media_common |
Zdroj: | IJCNN |
DOI: | 10.1109/ijcnn48605.2020.9207184 |
Popis: | Deep neural networks have proven highly effective at speech enhancement, which makes them attractive not just as front-ends for machine listening and speech recognition, but also as enhancement models for the benefit of human listeners. They are, however, usually being trained on loss functions that only assess quality in terms of a minimum mean squared error. This is neglecting the fact that human audio perception functions in a manner far better described by logarithmic measures than linear ones, that psychoacoustic hearing thresholds limit the perceptibility of many signal components in a mixture, and that a degree of continuity of signals may also be expected. Hence, sudden changes in the gain of a system may be detrimental. In the following, we cast these properties of human perception into a form that can aid the optimization of a deep neural network speech enhancement system. We explore their effects on a range of model topologies, showing the efficacy of the proposed modifications. |
Databáze: | OpenAIRE |
Externí odkaz: |