Multiple Speaker Localization using Mixture of Gaussian Model with Manifold-based Centroids
Autor: | Avital Bross, Sharon Gannot, Bracha Laufer-Goldshtein |
---|---|
Rok vydání: | 2021 |
Předmět: |
Reverberation
Computer science Microphone business.industry Maximum likelihood Feature vector Centroid 020206 networking & telecommunications Pattern recognition 02 engineering and technology Mixture model symbols.namesake Computer Science::Sound 0202 electrical engineering electronic engineering information engineering symbols 020201 artificial intelligence & image processing Artificial intelligence Cluster analysis business Gaussian network model |
Zdroj: | EUSIPCO 2020 28th European Signal Processing Conference (EUSIPCO) |
DOI: | 10.23919/eusipco47968.2020.9287796 |
Popis: | A data-driven approach for multiple speakers localization in reverberant enclosures is presented. The approach combines semi-supervised learning on multiple manifolds with unsupervised maximum likelihood estimation. The relative transfer functions (RTFs) are used in both stages of the proposed algorithm as feature vectors, which are known to be related to source positions. The microphone positions are not known. In the training stage, a nonlinear, manifold-based, mapping between RTFs and source locations is inferred using single-speaker utterances. The inference procedure utilizes two RTF datasets: A small set of RTFs with their associated position labels; and a large set of unlabelled RTFs. This mapping is used to generate a dense grid of localized sources that serve as the centroids of a Mixture of Gaussians (MoG) model, used in the test stage of the algorithm to cluster RTFs extracted from multiple-speakers utterances. Clustering is applied by applying the expectation-maximization (EM) procedure that relies on the sparsity and intermittency of the speech signals. A preliminary experimental study, with either two or three overlapping speakers in various reverberation levels, demonstrates that the proposed scheme achieves high localization accuracy compared to a baseline method using a simpler propagation model. |
Databáze: | OpenAIRE |
Externí odkaz: |