Classification of Landing and Distribution Domains Using Whois’ Text Mining

Autor:	Yukiko Sawaya, Ayumu Kubota, Akira Yamada, Jumpei Urakawa, Tran Phuong Thao, Kosuke Murakami
Rok vydání:	2017
Předmět:	021110 strategic defence & security studies Focus (computing) Exploit Computer science 0211 other engineering and technologies Decision tree 020206 networking & telecommunications 02 engineering and technology Computer security computer.software_genre Support vector machine Landing page Web page 0202 electrical engineering electronic engineering information engineering Malware F1 score computer
Zdroj:	TrustCom/BigDataSE/ICESS
DOI:	10.1109/trustcom/bigdatase/icess.2017.213
Popis:	Detection of drive-by-download attack has gained a focus in security research since the attack has turned into the most popular and serious threat to web infrastructure. The attack exploits vulnerabilities in web browsers and their extensions for unnoticeably downloading malicious software. Often, the victim is sent through a long chain of redirection operations in order to take down the offending pages. Concretely, the attack is triggered when a user visits a benign webpage that is compromised by the attacker (called landing page) and is inserted some malicious code inside. The user is then automatically redirected to an actual page that installs malware on the user's computer (called distribution page) without his/her consent or knowledge. While there is a large body of works targeting on detection of drive-by download attack, there is little attention on the redirection which is a crucial characteristic of the attack. In this paper, for the first time, we propose an approach to the classification of landing and distribution domains which are important components forming the head and tail of a redirection chain in the attack. The methodology in our approach is to use machine learning for text mining on the registered information of the domains called whois. We intensively implemented our approach with six popular supervised learning algorithms, compared the results and concluded that Linear-based Support Vector Machine and CART algorithm-based Decision Tree are the best models for our dataset which respectively give 98.55% and 99.28% of accuracy, 97.78% and 98.95% of F1 score, 98.35% and 99.45% of average precision.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::eb0a8aa74e832ad47a189ba7426e306f https://doi.org/10.1109/trustcom/bigdatase/icess.2017.213 Zobrazit plný text záznamu