Chinese Spam Classifier based on Positive and Unlabeled Learning

Autor: LIN,CHI, 林奇
Rok vydání: 2019
Druh dokumentu: 學位論文 ; thesis
Popis: 107
With the rapid development of today's social networking and communication equipment and the paperless promotion of global energy conservation and carbon reduction, e-mail has become an indispensable communication channel in our lives and work. E-mail is not only simple to use and fast to transfer, but also cheaper than traditional paper letters. However, due to the wide application of e-mail, many people who want to profit from it have used a large number of e-mails to spread advertisements, made users to purchase goods, thereby gaining huge profits, and even computer hackers use this method for cyberattack causing users' computers to be infected, stealing data, and so on. In this paper, in order to solve the problem that the labeled mail in real life is not easy to obtain, we use the positive and unlabeled learning method to construct the spam classifier. By combining a small number of positive data with a large number of unlabeled data, reliable negative data can be extracted from unlabeled data. Then use random forest, logistic regression, Naïve Bayes algorithm to construct the classifier of spam mail detection, and finally prove the validity of the positive and unlabeled learning.
Databáze: Networked Digital Library of Theses & Dissertations