Multimodal news analytics using measures of cross-modal entity and context consistency

Autor: Sebastian Diering, Jonas Theiner, Sherzod Hakimov, Ralph Ewerth, Maximilian Idahl, Eric Müller-Budack
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Cover (telecommunications)
Computer science
Image-text relations
050801 communication & media studies
Context (language use)
News analytics
02 engineering and technology
Library and Information Sciences
Cross-modal consistency
Image repurposing detection
ddc:070
Consistency (database systems)
0508 media and communications
Dewey Decimal Classification::000 | Allgemeines
Wissenschaft::000 | Informatik
Wissen
Systeme::004 | Informatik

Similarity (psychology)
0202 electrical engineering
electronic engineering
information engineering

Media Technology
Information retrieval
Dewey Decimal Classification::600 | Technik::660 | Technische Chemie
05 social sciences
Dewey Decimal Classification::000 | Allgemeines
Wissenschaft::020 | Bibliotheks- und Informationswissenschaft

Contrast (statistics)
Modal
ddc:020
Dewey Decimal Classification::000 | Allgemeines
Wissenschaft::070 | Nachrichtenmedien
Journalismus
Verlagswesen

ddc:660
020201 artificial intelligence & image processing
ddc:004
Coherence (linguistics)
Information Systems
Zdroj: International Journal of Multimedia Information Retrieval 10 (2021), Nr. 2
International Journal of Multimedia Information Retrieval
DOI: 10.15488/12349
Popis: The World Wide Web has become a popular source to gather information and news. Multimodal information, e.g., supplement text with photographs, is typically used to convey the news more effectively or to attract attention. The photographs can be decorative, depict additional details, but might also contain misleading information. The quantification of the cross-modal consistency of entity representations can assist human assessors’ evaluation of the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today’s society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text inreal-worldnews. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of the entities in text and photograph by exploiting state-of-the-art computer vision approaches. In contrast to previous work, our system automatically acquires example data from the Web and is applicable to real-world news. Moreover, an approach that quantifies contextual image-text relations is introduced. The feasibility is demonstrated on two datasets that cover different languages, topics, and domains.
Databáze: OpenAIRE