Abstrakt: |
The COVID-19 pandemic provided an infodemic situation to face people in the society with a massive amount of information due to accessing social media, such as Twitter and Instagram. These platforms have made the information circulation easy and paved the ground to mix information and misinformation. One solution to prevent an infodemic situation is avoiding false information distribution and filtering the fake news to reduce the negative impact of such news in the society. This article aims at studying the properties of fake news in English and Persian using the textual information transmitted through language in the news. To this end, the properties existed in a text based on information theory, stylometry information from raw texts, readability of the texts, and linguistic information, such as phonology, syntax, and morphology, are studied. In this study, we use the XLM-RoBERTa representation with a convolutional neural network classifier as the basic model to detect English and Persian COVID-19 fake news. In addition, we propose different learning scenarios such that different feature sets are concatenated with the contextualized representation. According to the experimental results, adding any of the textual information to the basic model has improved the performance of the classifier for both English and Persian. Information about readability of the texts and stylometry features have been the most effective features for detecting English fake news and improved the performance by 2.72% based on F-measure. Augmenting this feature setting with the information amount and linguistic morphological information improved the performance of the classifier by 3.79% based on F-measure for Persian. [ABSTRACT FROM AUTHOR] |