Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming

Autor:	Masuyama, Yoshiki, Togami, Masahito, Komatsu, Tatsuya
Rok vydání:	2019
Předmět:	Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	In this paper, we propose two mask-based beamforming methods using a deep neural network (DNN) trained by multichannel loss functions. Beamforming technique using time-frequency (TF)-masks estimated by a DNN have been applied to many applications where TF-masks are used for estimating spatial covariance matrices. To train a DNN for mask-based beamforming, loss functions designed for monaural speech enhancement/separation have been employed. Although such a training criterion is simple, it does not directly correspond to the performance of mask-based beamforming. To overcome this problem, we use multichannel loss functions which evaluate the estimated spatial covariance matrices based on the multichannel Itakura--Saito divergence. DNNs trained by the multichannel loss functions can be applied to construct several beamformers. Experimental results confirmed their effectiveness and robustness to microphone configurations. Comment: 5 pages, Accepted at INTERSPEECH 2019
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/1907.04984 Zobrazit plný text záznamu View this record from Arxiv