M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment

Autor:	Nguyen-Phuoc, Long, Gaboriau, Renald, Delacroix, Dimitri, Navarro, Laurent
Rok vydání:	2024
Předmět:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Multimedia Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj:	Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2 VISAPP: VISAPP, 869-876, 2024 , Rome, Italy
Druh dokumentu:	Working Paper
DOI:	10.5220/0012575100003660
Popis:	This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model's three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe's single-task baseline, M\&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2403.09451 Zobrazit plný text záznamu View this record from Arxiv