SLOGD: Speaker LOcation Guided Deflation approach to speech separation

Autor:	Sivasankaran, Sunit, Vincent, Emmanuel, Fohr, Dominique
Rok vydání:	2019
Předmět:	Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	Speech separation is the process of separating multiple speakers from an audio recording. In this work we propose to separate the sources using a Speaker LOcalization Guided Deflation (SLOGD) approach wherein we estimate the sources iteratively. In each iteration we first estimate the location of the speaker and use it to estimate a mask corresponding to the localized speaker. The estimated source is removed from the mixture before estimating the location and mask of the next source. Experiments are conducted on a reverberated, noisy multichannel version of the well-studied WSJ-2MIX dataset using word error rate (WER) as a metric. The proposed method achieves a WER of $44.2$%, a $34$% relative improvement over the system without separation and $17$% relative improvement over Conv-TasNet. Comment: Submitted to ICASSP 2020
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/1910.11131 Zobrazit plný text záznamu View this record from Arxiv