Machine Learning-Based Text Classification Comparison: Turkish Language Context

Autor:	Yehia Ibrahim Alzoubi, Ahmet E. Topcu, Ahmed Enis Erkaya
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Turkish texts machine learning text preprocessing algorithm effectiveness Technology Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999
Zdroj:	Applied Sciences, Vol 13, Iss 16, p 9428 (2023)
Druh dokumentu:	article
ISSN:	2076-3417
DOI:	10.3390/app13169428
Popis:	The growth in textual data associated with the increased usage of online services and the simplicity of having access to these data has resulted in a rise in the number of text classification research papers. Text classification has a significant influence on several domains such as news categorization, the detection of spam content, and sentiment analysis. The classification of Turkish text is the focus of this work since only a few studies have been conducted in this context. We utilize data obtained from customers’ inquiries that come to an institution to evaluate the proposed techniques. Classes are assigned to such inquiries specified in the institution’s internal procedures. The Support Vector Machine, Naïve Bayes, Long Term-Short Memory, Random Forest, and Logistic Regression algorithms were used to classify the data. The performance of the various techniques was then analyzed after and before data preparation, and the results were compared. The Long Term-Short Memory technique demonstrated superior effectiveness in terms of accuracy, achieving an 84% accuracy rate, surpassing the best accuracy record of traditional techniques, which was 78% accuracy for the Support Vector Machine technique. The techniques performed better once the number of categories in the dataset was reduced. Moreover, the findings show that data preparation and coherence between the classes’ number and the number of training sets are significant variables influencing the techniques’ performance. The findings of this study and the text classification technique utilized may be applied to data in dialects other than Turkish.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/4f6b822d0c6043ce83d604b0abbae6f1 Zobrazit plný text záznamu View record in DOAJ