Asynchronous and Event-Based Fusion Systems for Affect Recognition on Naturalistic Data in Comparison to Conventional Approaches
Autor: | Björn Schuller, Elisabeth André, Raymond Brueckner, Florian Lingenfelser, Jun Deng, Johannes Wagner |
---|---|
Rok vydání: | 2018 |
Předmět: |
Modalities
Artificial neural network Time delay neural network business.industry Computer science 020208 electrical & electronic engineering 02 engineering and technology Machine learning computer.software_genre Human-Computer Interaction Recurrent neural network Categorization Asynchronous communication 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence ddc:004 Affective computing Hidden Markov model business computer Software |
Zdroj: | IEEE Transactions on Affective Computing. 9:410-423 |
ISSN: | 2371-9850 |
DOI: | 10.1109/taffc.2016.2635124 |
Popis: | Throughout many present studies dealing with multi-modal fusion, decisions are synchronously forced for fixed time segments across all modalities. Varying success is reported, sometimes performance is worse than unimodal classification. Our goal is the synergistic exploitation of multimodality whilst implementing a real-time system for affect recognition in a naturalistic setting. Therefore we present a categorization of possible fusion strategies for affect recognition on continuous time frames of complete recording sessions and we evaluate multiple implementations from resulting categories. These involve conventional fusion strategies as well as novel approaches that incorporate the asynchronous nature of observed modalities. Some of the latter algorithms consider temporal alignments between modalities and observed frames by applying asynchronous neural networks that use memory blocks to model temporal dependencies. Others use an indirect approach that introduces events as an intermediate layer to accumulate evidence for the target class through all modalities. Recognition results gained on a naturalistic conversational corpus show a drop in recognition accuracy when moving from unimodal classification to synchronous multimodal fusion. However, with our proposed asynchronous and event-based fusion techniques we are able to raise the recognition system's accuracy by 7.83 percent compared to video analysis and 13.71 percent in comparison to common fusion strategies. |
Databáze: | OpenAIRE |
Externí odkaz: |