Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming
Autor: | Masuyama, Yoshiki, Togami, Masahito, Komatsu, Tatsuya |
---|---|
Rok vydání: | 2019 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | In this paper, we propose two mask-based beamforming methods using a deep neural network (DNN) trained by multichannel loss functions. Beamforming technique using time-frequency (TF)-masks estimated by a DNN have been applied to many applications where TF-masks are used for estimating spatial covariance matrices. To train a DNN for mask-based beamforming, loss functions designed for monaural speech enhancement/separation have been employed. Although such a training criterion is simple, it does not directly correspond to the performance of mask-based beamforming. To overcome this problem, we use multichannel loss functions which evaluate the estimated spatial covariance matrices based on the multichannel Itakura--Saito divergence. DNNs trained by the multichannel loss functions can be applied to construct several beamformers. Experimental results confirmed their effectiveness and robustness to microphone configurations. Comment: 5 pages, Accepted at INTERSPEECH 2019 |
Databáze: | arXiv |
Externí odkaz: |