Loss Functions for Deep Monaural Speech Enhancement

Autor: Christopher Schymura, Steffen Zeiler, Lea Schönherr, Jan Freiwald, Dorothea Kolossa
Rok vydání: 2020
Předmět:
Zdroj: IJCNN
DOI: 10.1109/ijcnn48605.2020.9207184
Popis: Deep neural networks have proven highly effective at speech enhancement, which makes them attractive not just as front-ends for machine listening and speech recognition, but also as enhancement models for the benefit of human listeners. They are, however, usually being trained on loss functions that only assess quality in terms of a minimum mean squared error. This is neglecting the fact that human audio perception functions in a manner far better described by logarithmic measures than linear ones, that psychoacoustic hearing thresholds limit the perceptibility of many signal components in a mixture, and that a degree of continuity of signals may also be expected. Hence, sudden changes in the gain of a system may be detrimental. In the following, we cast these properties of human perception into a form that can aid the optimization of a deep neural network speech enhancement system. We explore their effects on a range of model topologies, showing the efficacy of the proposed modifications.
Databáze: OpenAIRE