Developing a code-mixed sentiment analysis dataset of Xitsonga-English music review.

Autor: Nkuna, Blessing, Modipa, Thipe I., Ramalepe, Simon P.
Předmět:
Zdroj: Journal of the Digital Humanities Association of Southern Africa (DHASA); 2024, Vol. 5 Issue 1, p1-7, 7p
Abstrakt: Sentiment analysis is the process of classifying text emotions as positive, negative or neutral. Code-mixed sentiment analysis refers to the classification of text's sentiments that contains two or more languages. There are limited studies developed for sentiment analysis on South African code-mixed languages and this is due to the absence of annotated dataset. The purpose of the study was to collect code-mixed text data for the Xitsonga-English language pair. The study collected Xitsonga-English code-mixed comments for music reviews from a YouTube channel. After the data was collected, tokenization using a python library called natural language toolkit was performed. Subsequently, we analyzed the comments for the presence of code-mixing. The collected Xitsonga-English code-mixed data would be suitable to build a sentiment analysis model. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index