Mining Actionable Information from Security Forums: The Case of Malicious IP Addresses

Autor:	Konstantinos Pelechrinis, Andre Castro, Joobin Gharibshah, Evangelos E. Papalexakis, Tai Ching Li, Michalis Faloutsos
Rok vydání:	2019
Předmět:	Focus (computing) Data collection Computer science 020206 networking & telecommunications 02 engineering and technology Blacklist Constructed language World Wide Web Identification (information) 0202 electrical engineering electronic engineering information engineering Key (cryptography) Feature (machine learning) 020201 artificial intelligence & image processing Hacker
Zdroj:	Lecture Notes in Social Networks ISBN: 9783030112851
DOI:	10.1007/978-3-030-11286-8_9
Popis:	The goal of this work is to systematically extract information from hacker forums, whose information would be in general described as unstructured: the text of a post is not necessarily following any writing rules. By contrast, many security initiatives and commercial entities are harnessing the readily public information, but they seem to focus on structured sources of information. Here, we focus on the problem of identifying malicious IP addresses, among the IP addresses which are reported in the forums. We develop a method to automate the identification of malicious IP addresses with the design goal of being independent of external sources. A key novelty is that we use a matrix decomposition method to extract latent features of the behavioral information of the users, which we combine with textual information from the related posts. A key design feature of our technique is that it can be readily applied to different language forums, since it does not require a sophisticated natural language processing approach. In particular, our solution only needs a small number of keywords in the new language plus the user’s behavior captured by specific features. We also develop a tool to automate the data collection from security forums. Using our tool, we collect approximately 600K posts from three different forums. Our method exhibits high classification accuracy, while the precision of identifying malicious IP in post is greater than 88% in all three forums. We argue that our method can provide significantly more information: we find up to three times more potentially malicious IP address compared to the reference blacklist VirusTotal. As the cyber-wars are becoming more intense, having early accesses to useful information becomes more imperative to remove the hackers first-move advantage, and our work is a solid step towards this direction.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::a72753eac23b7fd9672d5cc8cdccc471 https://doi.org/10.1007/978-3-030-11286-8_9 Zobrazit plný text záznamu