Annotated dataset for the fake news classification in Slovak language

Autor: Viera Maslej-Kresnakova, Martin Sarnovsky, Nikola Hrabovska
Rok vydání: 2020
Předmět:
Zdroj: 2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA).
DOI: 10.1109/iceta51985.2020.9379254
Popis: Fake news detection currently presents an active field of research. Detection methods based on natural language processing and machine learning are being developed to automatically identify the possible misinformation contained within the news articles. To successfully train these models, annotated data are needed. In English language, multiple human-annotated datasets already are available and are being widely used in the research. The main objective of the work presented in this paper, was to create similar dataset consisting of articles in Slovak language. We collected the data from the various local news portals including reputable publishers as well as suspicious conspiratory portals. To obtain the annotations, we used crowdsourcing approach. Annotated dataset was used in preliminary experiments, in which neural network classifier was trained and evaluated.
Databáze: OpenAIRE