V-LTCS: Backbone exploration for Multimodal Misogynous Meme detection

Autor:	Sneha Chinivar, Roopa M.S., Arunalatha J.S., Venugopal K.R.
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Misogynous Memes Neural network classifier Vision-language transformer model Computational linguistics. Natural language processing P98-98.5
Zdroj:	Natural Language Processing Journal, Vol 9, Iss , Pp 100109- (2024)
Druh dokumentu:	article
ISSN:	2949-7191
DOI:	10.1016/j.nlp.2024.100109
Popis:	Memes have become a fundamental part of online communication and humour, reflecting and shaping the culture of today’s digital age. The amplified Meme culture is inadvertently endorsing and propagating casual Misogyny. This study proposes V-LTCS (Vision- Language Transformer Combination Search), a framework that encompasses all possible combinations of the most fitting Text (i.e. BERT, ALBERT, and XLM-R) and Vision (i.e. Swin, ConvNeXt, and ViT) Transformer Models to determine the backbone architecture for identifying Memes that contains misogynistic contents. All feasible Vision-Language Transformer Model combinations obtained from the recognized optimal Text and Vision Transformer Models are evaluated on two (smaller and larger) datasets using varied standard metrics (viz. Accuracy, Precision, Recall, and F1-Score). The BERT-ViT combinational Transformer Model demonstrated its efficiency on both datasets, validating its ability to serve as a backbone architecture for all subsequent efforts to recognize Multimodal Misogynous Memes.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/d143db5b4fcc48b596bcd11dd47981e6 Zobrazit plný text záznamu View record in DOAJ