No Rumours Please! A Multi-Indic-Lingual Approach for COVID Fake-Tweet Detection

Autor:	Suranjana Samanta, Amar Prakash Azad, Mohit Bhardwaj, Debanjana Kar
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Coronavirus disease 2019 (COVID-19) Computer science media_common.quotation_subject computer.software_genre Machine Learning (cs.LG) Scarcity Social media Misinformation media_common Social and Information Networks (cs.SI) Hindi Computer Science - Computation and Language business.industry Computer Science - Social and Information Networks language.human_language Zero (linguistics) Outreach Bengali language Artificial intelligence business Computation and Language (cs.CL) computer Natural language processing
Popis:	The sudden widespread menace created by the present global pandemic COVID-19 has had an unprecedented effect on our lives. Man-kind is going through humongous fear and dependence on social media like never before. Fear inevitably leads to panic, speculations, and the spread of misinformation. Many governments have taken measures to curb the spread of such misinformation for public well being. Besides global measures, to have effective outreach, systems for demographically local languages have an important role to play in this effort. Towards this, we propose an approach to detect fake news about COVID-19 early on from social media, such as tweets, for multiple Indic-Languages besides English. In addition, we also create an annotated dataset of Hindi and Bengali tweet for fake news detection. We propose a BERT based model augmented with additional relevant features extracted from Twitter to identify fake tweets. To expand our approach to multiple Indic languages, we resort to mBERT based model which is fine-tuned over created dataset in Hindi and Bengali. We also propose a zero-shot learning approach to alleviate the data scarcity issue for such low resource languages. Through rigorous experiments, we show that our approach reaches around 89% F-Score in fake tweet detection which supercedes the state-of-the-art (SOTA) results. Moreover, we establish the first benchmark for two Indic-Languages, Hindi and Bengali. Using our annotated data, our model achieves about 79% F-Score in Hindi and 81% F-Score for Bengali Tweets. Our zero-shot model achieves about 81% F-Score in Hindi and 78% F-Score for Bengali Tweets without any annotated data, which clearly indicates the efficacy of our approach. 6 pages, 4 figures
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::22b7e4d4f38e8e436a77f620623842f8 http://arxiv.org/abs/2010.06906 Zobrazit plný text záznamu