InferIP

Autor: Joobin Gharibshah, Konstantinos Pelechrinis, Andre Castro, Tai Ching Li, Maria Vanrell, Evangelos E. Papalexakis, Michalis Faloutsos
Rok vydání: 2017
Předmět:
Zdroj: ASONAM
DOI: 10.1145/3110025.3110055
Popis: How much useful information can we extract from security forums? Many security initiatives and commercial entities are harnessing the readily public information, but they seem to focus on structured sources of information. Our goal here is to extract information from hacker forums, whose information is provided in ad hoc and unstructured ways. Here, we focus on the problem of identifying malicious IPs addresses, when these are being reported in the forums. We develop a method to automate the identification of malicious IPs with the design goal of being independent of external sources. A key novelty is that we use a matrix decomposition method to extract latent features of the behavioral information of the users, which we combine with textual information from the related posts. As key design feature, our technique can be applied to different language forums since it relies on a simple NLP solution in combination with behavioral features. In particular, our solution only needs a small number of keywords in the new language plus the user's behavior captured by specific features. We also develop a tool to automate the data collection from security forums. We collect approximately 600K posts from 3 different forums. Our method exhibits high classification accuracy, while the precision of identifying malicious IP in post is greater than 88% in all three sites. Furthermore, by applying our method, we find up to 3 times more potentially malicious IPs than compared to the reference blacklist VirusTotal. As the cyber-wars are becoming more intense, having early accesses to useful information becomes more imperative to remove the hackers first-move advantage, and our work is a solid step towards this direction.
Databáze: OpenAIRE