Classification Spam Email with Elimination of Unsuitable Features with Hybrid of GA-Naive Bayes
Autor: | F. Ahmadzadeh, O. M. E. Ebadati |
---|---|
Rok vydání: | 2019 |
Předmět: |
Service (systems architecture)
Computer Networks and Communications Email spam Computer science business.industry Library and Information Sciences Machine learning computer.software_genre Computer Science Applications Naive Bayes classifier Genetic algorithm Artificial intelligence business computer |
Zdroj: | Journal of Information & Knowledge Management. 18:1950008 |
ISSN: | 1793-6926 0219-6492 |
Popis: | Email spam is a security problem that involves different techniques in machine learning to solve this problem. The rise of this security issue makes organisation email service unreliable and has a direct relation with vulnerability of clients through unexpected spam mails, like ransomware. There are several methods to identifying spam emails. Most of these methods focused on feature selection; however, these models decreased the accuracy of the detection. This paper proposed a novel spam detection method that is not only to decrease the accuracy, but eliminates unsuitable features with less processing. The features are in the terms of contents, and the number of features is very big, so it can decrease the memory complexity. We use Hewlett-Packet (HP) laboratory samples text emails. First, GA algorithm is employed to select features without limited number of feature selection with the aid of Bayesian theory as a fitness function and checked with a different number of repetitions. The result improved with GA by increasing number of repetitions, and tested with distinctive selection method, Random selection and Tournament selection. In the second stage, the dataset classifies emails as Spam or Ham by Naive Bayes. The results show that Naive Bayes and hybrid GA-Naive Bayes are almost identical, but GA-Naive Bayes has a better performance. |
Databáze: | OpenAIRE |
Externí odkaz: |