Hoax Classification with Term Frequency - Inverse Document Frequency Using Non-Linear SVM and Naïve Bayes.

Autor: Kesumawati, Ayundyah, Thalib, Achmad Kurniansyah
Předmět:
Zdroj: International Journal of Advances in Soft Computing & Its Applications; 2018, Vol. 10 Issue 3, p115-128, 14p
Abstrakt: In recent years, there are crucial issues in the modern society that gain information on the internet. Spreading the news very easily but can lead to very difficult to filtering the information. The flow of information that provides broad benefits to society, can even enter into the psychology and social for the integrity of the Nation. Information that is easily obtained is extremely dangerous in terms of validity and is not uncommonly a hoax. The dataset that used in this research was gained from news website detik.com and turnbackhoax.id. in this research will provide the comparing of two methods there are Naïve Bayes Classifier (NBC) and Support Vector Machine (SVM) with Radial Basis Function. This research using the Term Frequency - Inverse Document Frequency Weighting (TF-IDFW) that separated each word to make it easy to analyze the text classification. The results obtained for accuracy NBC with training data of 1.480 and test data of 369 is 85.09% and for SVM obtained an accuracy of 83.74%. In addition, the merging of information with text mining, the keyword for the news category is "Price", followed by "KPK", "Stock", "Indonesia", "DPR", and "Police". For the hoax category, the most words are the word "Price", followed by "KPK", "Stock", "Indonesia", "DPR", and "Police". [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index