A study on attention-based objective function in deep denoising autoencoder based speech enhancement

Autor: Shih-Kuang Lee, Kuo-Hsuan Hung, Ying-Hui Lai, Hsiang-Ping Hsu, Yi-Ying Kao, Yu Tsao, Chen-Yu Chiang
Rok vydání: 2019
Předmět:
Zdroj: The Journal of the Acoustical Society of America. 146:2794-2794
ISSN: 0001-4966
Popis: Speech is one of the most direct and convenient human\machine interfaces. In real-world scenarios, however, various interferences and noises may deteriorate the speech signals and thus reduce speech quality and intelligibility. Therefore, speech enhancement (SE) is an essential component in speech-communication systems. Recently, numerous deep-learning-based SE approaches have been proposed and yield satisfactory performance. In a deep-learning-based SE system, defining a proper objective function plays a crucial role to its success. Generally, the mean square error (MSE) of the predicted and desired outputs are used to form the objective function to learn the parameters in deep-learning models. Because a sequence of speech signals contains various patterns, such as consonant, vowel, beginning and ending silences, and short pauses, it is not optimal to simply use MSE as the objective function, since the contributions of these different patterns may be averaged out. Instead, we should apply specific weights for distinct patterns when designing the objective function. In this presentation, we present a novel objective function, which is used in deep denoising autoencoder-based SE system. The proposed objective function is derived by MSE with multiplying a ratio calculated from clean and noisy speech. The result is evaluated using standardized evaluation metrics, and experiment results confirm the proposed objective function is beneficial to improve the intelligibility of enhanced speech.
Databáze: OpenAIRE