Multimodal Categorization of Crisis Events in Social Media

Autor: Mahdi Abavisani, Joel Tetreault, Alejandro Jaimes, Shengli Hu, Liwei Wu
Jazyk: angličtina
Rok vydání: 2020
Předmět:
FOS: Computer and information sciences
Computer Science - Machine Learning
Computer science
Computer Science - Artificial Intelligence
Computer Vision and Pattern Recognition (cs.CV)
Feature extraction
Computer Science - Computer Vision and Pattern Recognition
Sample (statistics)
02 engineering and technology
010501 environmental sciences
Machine learning
computer.software_genre
01 natural sciences
Machine Learning (cs.LG)
Margin (machine learning)
0202 electrical engineering
electronic engineering
information engineering

Social media
0105 earth and related environmental sciences
Computer Science - Computation and Language
Contextual image classification
Event (computing)
business.industry
I.5.4
Information quality
Visualization
Artificial Intelligence (cs.AI)
Categorization
020201 artificial intelligence & image processing
Artificial intelligence
business
computer
Computation and Language (cs.CL)
Zdroj: CVPR
Popis: Recent developments in image classification and natural language processing, coupled with the rapid growth in social media usage, have enabled fundamental advances in detecting breaking events around the world in real-time. Emergency response is one such area that stands to gain from these advances. By processing billions of texts and images a minute, events can be automatically detected to enable emergency response workers to better assess rapidly evolving situations and deploy resources accordingly. To date, most event detection techniques in this area have focused on image-only or text-only approaches, limiting detection performance and impacting the quality of information delivered to crisis response teams. In this paper, we present a new multimodal fusion method that leverages both images and texts as input. In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities on a sample by sample basis. In addition, we employ a multimodal graph-based approach to stochastically transition between embeddings of different multimodal pairs during training to better regularize the learning process as well as dealing with limited training data by constructing new matched pairs from different samples. We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
Conference on Computer Vision and Pattern Recognition (CVPR 2020)
Databáze: OpenAIRE