A comparison study on patient-psychologist voice diarization

Autor:	Riad, Rachid, Titeux, Hadrien, Cao, Xuan Nga, Dupoux, Emmanuel, Lemoine, Laurie, Montillot, Justine, Sliwinski, Agnes, Bagnou, Jennifer Hamet, Anne-Catherine Bachoud-Lévi
Přispěvatelé:	Laboratoire de sciences cognitives et psycholinguistique (LSCP), Département d'Etudes Cognitives - ENS Paris (DEC), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS), Apprentissage machine et développement cognitif (CoML), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Département d'Etudes Cognitives - ENS Paris (DEC), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), École des hautes études en sciences sociales (EHESS), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Institut Mondor de Recherche Biomédicale (IMRB), Institut National de la Santé et de la Recherche Médicale (INSERM)-IFR10-Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12), Centre de référence maladie de Huntington, Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Hôpital Henri Mondor-CHU Pitié-Salpêtrière [AP-HP], Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Sorbonne Université (SU)-Sorbonne Université (SU)-CHU Trousseau [APHP], Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Sorbonne Université (SU), This work is funded in part by Grants from Neuratris, from Facebook AI Research (Research Gift), Google (Faculty Research Award), Microsoft Research (Azure Credits and Grant), and Amazon Web Service (AWS Research Credits), ANR-17-EURE-0017,FrontCog,Frontières en cognition(2017), ANR-10-IDEX-0001,PSL,Paris Sciences et Lettres(2010), ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019), CHU Trousseau [APHP], Zermani, Sabrina, Frontières en cognition - - FrontCog2017 - ANR-17-EURE-0017 - EURE - VALID, Initiative d'excellence - Paris Sciences et Lettres - - PSL2010 - ANR-10-IDEX-0001 - IDEX - VALID, PaRis Artificial Intelligence Research InstitutE - - PRAIRIE2019 - ANR-19-P3IA-0001 - P3IA - VALID
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	[SCCO.LING] Cognitive science/Linguistics [SCCO.LING]Cognitive science/Linguistics
Zdroj:	SLPAT 2022-9th Workshop on Speech and Language Processing for Assistive Technologies SLPAT 2022-9th Workshop on Speech and Language Processing for Assistive Technologies, May 2022, Dublin, Ireland Web of Science
Popis:	International audience; Conversations between a clinician and a patient, in natural conditions, are valuable sources of information for medical follow-up. The automatic analysis of these dialogues could help extract new language markers and speed up the clinicians' reports. Yet, it is not clear which model is the most efficient to detect and identify the speaker turns, especially for individuals with speech disorders. Here, we proposed a split of the data that allows conducting a comparative evaluation of different diarization methods. We designed and trained end-to-end neural network architectures to directly tackle this task from the raw signal and evaluate each approach under the same metric. We also studied the effect of fine-tuning models to find the best performance. Experimental results are reported on naturalistic clinical conversations between Psychologists and Interviewees, at different stages of Huntington's disease, displaying a large panel of speech disorders. We found out that our best end-to-end model achieved 19.5% IER on the test set, compared to 23.6% achieved by the finetuning of the X-vector architecture. Finally, we observed that we could extract clinical markers directly from the automatic systems, highlighting the clinical relevance of our methods.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7511dbe6ec9b204fa23ffa278fd633ef https://hal.inria.fr/hal-03831674 Zobrazit plný text záznamu