Macro-segmentation automatique en séquence d'interaction : une approche basée sur les silences pour la structuration de réunions
Autor: | Julien Pinquier, Selim Mechrouh, Lionel Pibre, Thomas Pellegrini, Isabelle Ferrané |
---|---|
Přispěvatelé: | FERRANÉ, Isabelle, Équipe Structuration, Analyse et MOdélisation de documents Vidéo et Audio (IRIT-SAMoVA), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Financement BPI - Investissement d'avenir, University of Lille - France, LinTO |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Voice activity detection
Computer science business.industry Search engine indexing Feature extraction [INFO] Computer Science [cs] computer.software_genre Information sensitivity Audio segmentation meeting structuring Sliding window protocol Segmentation [INFO]Computer Science [cs] interaction sequence Artificial intelligence Macro business Set (psychology) computer Natural language processing clustering |
Zdroj: | à paraître Content-Based Multimedia Indexing (CBMI 2021) Content-Based Multimedia Indexing (CBMI 2021), University of Lille-France, Jun 2021, Lille, France CBMI |
Popis: | International audience; Meetings are a common activity in professional contexts, and it remains difficult to analyze them because they are not always structured and people cut each other off (in a debate of ideas for example). A first step, to facilitate their analysis, is to segment the meeting into homogeneous zones at interaction level. To do so, we studied the typology of the nonspeech segments (pauses and silences) in order to determine the different sequences during a meeting. Indeed, information such as the frequency and lengths of the non-speech segments will be different during a presentation or a debate. In this article, we propose an original approach to segment meetings using only the non-speech segments. We apply a Voice Activity Detection (VAD) to find the non-speech segments from which a set of parameters are extracted to study the typology of silence segments.We then use a sliding window on the whole meeting and we apply an unsupervised approach on each of these windows. We have validated our approaches using purity and coverage metrics on part of the AMI corpus (38 meetings of about 28 minutes each). This approach is non-invasive and relies only on acoustic information and does not analyze speech content since moments containing speech, and potentially sensitive information, are not processed. |
Databáze: | OpenAIRE |
Externí odkaz: |