No Rumours Please! A Multi-Indic-Lingual Approach for COVID Fake-Tweet Detection
Autor: | Suranjana Samanta, Amar Prakash Azad, Mohit Bhardwaj, Debanjana Kar |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Coronavirus disease 2019 (COVID-19) Computer science media_common.quotation_subject computer.software_genre Machine Learning (cs.LG) Scarcity Social media Misinformation media_common Social and Information Networks (cs.SI) Hindi Computer Science - Computation and Language business.industry Computer Science - Social and Information Networks language.human_language Zero (linguistics) Outreach Bengali language Artificial intelligence business Computation and Language (cs.CL) computer Natural language processing |
Popis: | The sudden widespread menace created by the present global pandemic COVID-19 has had an unprecedented effect on our lives. Man-kind is going through humongous fear and dependence on social media like never before. Fear inevitably leads to panic, speculations, and the spread of misinformation. Many governments have taken measures to curb the spread of such misinformation for public well being. Besides global measures, to have effective outreach, systems for demographically local languages have an important role to play in this effort. Towards this, we propose an approach to detect fake news about COVID-19 early on from social media, such as tweets, for multiple Indic-Languages besides English. In addition, we also create an annotated dataset of Hindi and Bengali tweet for fake news detection. We propose a BERT based model augmented with additional relevant features extracted from Twitter to identify fake tweets. To expand our approach to multiple Indic languages, we resort to mBERT based model which is fine-tuned over created dataset in Hindi and Bengali. We also propose a zero-shot learning approach to alleviate the data scarcity issue for such low resource languages. Through rigorous experiments, we show that our approach reaches around 89% F-Score in fake tweet detection which supercedes the state-of-the-art (SOTA) results. Moreover, we establish the first benchmark for two Indic-Languages, Hindi and Bengali. Using our annotated data, our model achieves about 79% F-Score in Hindi and 81% F-Score for Bengali Tweets. Our zero-shot model achieves about 81% F-Score in Hindi and 78% F-Score for Bengali Tweets without any annotated data, which clearly indicates the efficacy of our approach. 6 pages, 4 figures |
Databáze: | OpenAIRE |
Externí odkaz: |