Spam Detection on Profile and Social Media Network using Principal Component Analysis (PCA) and K-means Clustering.

Autor: Sanjaya, Samuel Ady, Surendro, Kridanto
Předmět:
Zdroj: International Journal of Advances in Soft Computing & Its Applications; Nov2019, Vol. 11 Issue 3, p108-123, 16p
Abstrakt: Social media as a means of communicating in cyberspace continues to grow both from the number of users, utilization, and the resulting impact. Existing social media ecosystems are influenced by the influence of public figures, trending topics, even spam, and spammers. Detection of spam accounts that have been done mostly using the method of classification or supervised learning. This will be a problem if the data is new and the supervised model is not updated it will increase the possibility of false detection. Based on the problem, this study will use Principal Component Analysis (PCA) and K-means clustering with Mahalanobis distance as a method to detect a collection of users who have similar properties to determine spam. This study uses 150 thousand twitter data with 15 thousand account data that described as graph data. The result, we find that error detection in the classification method to find spam is a class that made only two: spam and non-spam. Though in addition there are still other classes that have the characteristics of spam when it is not. In this paper, we defined the clusters on to 5 clusters: normal, news account and public activist, foreign account, public figure, and spam. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index