Macro-segmentation automatique en séquence d'interaction : une approche basée sur les silences pour la structuration de réunions

Autor: Julien Pinquier, Selim Mechrouh, Lionel Pibre, Thomas Pellegrini, Isabelle Ferrané
Přispěvatelé: FERRANÉ, Isabelle, Équipe Structuration, Analyse et MOdélisation de documents Vidéo et Audio (IRIT-SAMoVA), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Financement BPI - Investissement d'avenir, University of Lille - France, LinTO
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: à paraître
Content-Based Multimedia Indexing (CBMI 2021)
Content-Based Multimedia Indexing (CBMI 2021), University of Lille-France, Jun 2021, Lille, France
CBMI
Popis: International audience; Meetings are a common activity in professional contexts, and it remains difficult to analyze them because they are not always structured and people cut each other off (in a debate of ideas for example). A first step, to facilitate their analysis, is to segment the meeting into homogeneous zones at interaction level. To do so, we studied the typology of the nonspeech segments (pauses and silences) in order to determine the different sequences during a meeting. Indeed, information such as the frequency and lengths of the non-speech segments will be different during a presentation or a debate. In this article, we propose an original approach to segment meetings using only the non-speech segments. We apply a Voice Activity Detection (VAD) to find the non-speech segments from which a set of parameters are extracted to study the typology of silence segments.We then use a sliding window on the whole meeting and we apply an unsupervised approach on each of these windows. We have validated our approaches using purity and coverage metrics on part of the AMI corpus (38 meetings of about 28 minutes each). This approach is non-invasive and relies only on acoustic information and does not analyze speech content since moments containing speech, and potentially sensitive information, are not processed.
Databáze: OpenAIRE