Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

Autor:	Dovrat, Shaked, Nachmani, Eliya, Wolf, Lior
Rok vydání:	2021
Předmět:	Computer Science - Sound Computer Science - Artificial Intelligence Computer Science - Machine Learning Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	Single channel speech separation has experienced great progress in the last few years. However, training neural speech separation for a large number of speakers (e.g., more than 10 speakers) is out of reach for the current methods, which rely on the Permutation Invariant Loss (PIT). In this work, we present a permutation invariant training that employs the Hungarian algorithm in order to train with an $O(C^3)$ time complexity, where $C$ is the number of speakers, in comparison to $O(C!)$ of PIT based methods. Furthermore, we present a modified architecture that can handle the increased number of speakers. Our approach separates up to $20$ speakers and improves the previous results for large $C$ by a wide margin. Comment: Accepted to Interspeech 2021, Data creation link added
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2104.08955 Zobrazit plný text záznamu View this record from Arxiv