Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming

Autor: Masuyama, Yoshiki, Togami, Masahito, Komatsu, Tatsuya
Rok vydání: 2019
Předmět:
Druh dokumentu: Working Paper
Popis: In this paper, we propose two mask-based beamforming methods using a deep neural network (DNN) trained by multichannel loss functions. Beamforming technique using time-frequency (TF)-masks estimated by a DNN have been applied to many applications where TF-masks are used for estimating spatial covariance matrices. To train a DNN for mask-based beamforming, loss functions designed for monaural speech enhancement/separation have been employed. Although such a training criterion is simple, it does not directly correspond to the performance of mask-based beamforming. To overcome this problem, we use multichannel loss functions which evaluate the estimated spatial covariance matrices based on the multichannel Itakura--Saito divergence. DNNs trained by the multichannel loss functions can be applied to construct several beamformers. Experimental results confirmed their effectiveness and robustness to microphone configurations.
Comment: 5 pages, Accepted at INTERSPEECH 2019
Databáze: arXiv