Autor: |
Hodaya Hammer, Shlomo E. Chazan, Jacob Goldberger, Sharon Gannot |
Jazyk: |
angličtina |
Rok vydání: |
2021 |
Předmět: |
|
Zdroj: |
EURASIP Journal on Audio, Speech, and Music Processing, Vol 2021, Iss 1, Pp 1-10 (2021) |
Druh dokumentu: |
article |
ISSN: |
1687-4722 |
DOI: |
10.1186/s13636-021-00203-w |
Popis: |
Abstract In this study, we present a deep neural network-based online multi-speaker localization algorithm based on a multi-microphone array. Following the W-disjoint orthogonality principle in the spectral domain, time-frequency (TF) bin is dominated by a single speaker and hence by a single direction of arrival (DOA). A fully convolutional network is trained with instantaneous spatial features to estimate the DOA for each TF bin. The high-resolution classification enables the network to accurately and simultaneously localize and track multiple speakers, both static and dynamic. Elaborated experimental study using simulated and real-life recordings in static and dynamic scenarios demonstrates that the proposed algorithm significantly outperforms both classic and recent deep-learning-based algorithms. Finally, as a byproduct, we further show that the proposed method is also capable of separating moving speakers by the application of the obtained TF masks. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|