Exploiting temporal information to detect conversational groups in videos and predict the next speaker

Autor:	Tosato, Lucrezia, Fortier, Victor, Bloch, Isabelle, Pelachaud, Catherine
Rok vydání:	2024
Předmět:	Computer Science - Computer Vision and Pattern Recognition
Zdroj:	Pattern Recognition Letters Volume 177, January 2024, Pages 164 168
Druh dokumentu:	Working Paper
Popis:	Studies in human human interaction have introduced the concept of F formation to describe the spatial arrangement of participants during social interactions. This paper has two objectives. It aims at detecting F formations in video sequences and predicting the next speaker in a group conversation. The proposed approach exploits time information and human multimodal signals in video sequences. In particular, we rely on measuring the engagement level of people as a feature of group belonging. Our approach makes use of a recursive neural network, the Long Short Term Memory (LSTM), to predict who will take the speaker's turn in a conversation group. Experiments on the MatchNMingle dataset led to 85% true positives in group detection and 98% accuracy in predicting the next speaker. Comment: Accepted to Pattern Recognition Letter, 8 pages, 10 figures
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2408.16380 Zobrazit plný text záznamu View this record from Arxiv