Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization

Autor:	Paris Xylogiannis, Nikolaos Vryzas, Lazaros Vrysis, Charalampos Dimoulas
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	speaker diarization sound localization AI-enabled systems multimodal decision making deep learning smartphones Chemical technology TP1-1185
Zdroj:	Sensors, Vol 24, Iss 13, p 4229 (2024)
Druh dokumentu:	article
ISSN:	1424-8220
DOI:	10.3390/s24134229
Popis:	Speaker diarization consists of answering the question of “who spoke when” in audio recordings. In meeting scenarios, the task of labeling audio with the corresponding speaker identities can be further assisted by the exploitation of spatial features. This work proposes a framework designed to assess the effectiveness of combining speaker embeddings with Time Difference of Arrival (TDOA) values from available microphone sensor arrays in meetings. We extract speaker embeddings using two popular and robust pre-trained models, ECAPA-TDNN and X-vectors, and calculate the TDOA values via the Generalized Cross-Correlation (GCC) method with Phase Transform (PHAT) weighting. Although ECAPA-TDNN outperforms the Xvectors model, we utilize both speaker embedding models to explore the potential of employing a computationally lighter model when spatial information is exploited. Various techniques for combining the spatial–temporal information are examined in order to determine the best clustering method. The proposed framework is evaluated on two multichannel datasets: the AVLab Speaker Localization dataset and a multichannel dataset (SpeaD-M3C) enriched in the context of the present work with supplementary information from smartphone recordings. Our results strongly indicate that the integration of spatial information can significantly improve the performance of state-of-the-art deep learning diarization models, presenting a 2–3% reduction in DER compared to the baseline approach on the evaluated datasets.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/ff14739e8a074d0dba4886db4816059b Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.