Comparison of Machine Learning Approaches for Sentiment Analysis in Slovak.

Autor: Sokolová, Zuzana, Harahus, Maroš, Juhár, Jozef, Pleva, Matúš, Staš, Ján, Hládek, Daniel
Předmět:
Zdroj: Electronics (2079-9292); Feb2024, Vol. 13 Issue 4, p703, 21p
Abstrakt: The process of determining and understanding the emotional tone expressed in a text, with a focus on textual data, is referred to as sentiment analysis. This analysis facilitates the identification of whether the overall sentiment is positive, negative, or neutral. Sentiment analysis on social networks seeks valuable insight into public opinions, trends, and user sentiments. The main motivation is to enable informed decisions and an understanding of the dynamics of online discourse by businesses and researchers. Additionally, sentiment analysis plays a vital role in the field of hate speech detection, aiding in the identification and mitigation of harmful content on social networks. In this paper, studies on the sentiment analysis of texts in the Slovak language, as well as in other languages, are introduced. The primary aim of the paper, aside from releasing the "SentiSK" dataset to the public, is to evaluate our dataset by comparing its results with those of other existing datasets in the Slovak language. The "SentiSK" dataset, consisting of 34,006 comments, was created, specified, and annotated for the task of sentiment analysis. The proposed approach involved the utilization of three datasets in the Slovak language, with nine classification methods trained and compared in two defined tasks. For the first task, testing on the "SentiSK" and "Sentigrade" datasets involved three classes (positive, neutral, and negative). In the second task, testing on the "SentiSK", "Sentigrade", and "Slovak dataset for SA" datasets involved two classes (positive and negative). Selected models achieved an F1 score ranging from 75.35% to 95.04%. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index