A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence
Autor: | Christos Tryfonopoulos, Spiros Skiadopoulos, Paris Koloveas, Thanasis Chantzios |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Computer Science - Cryptography and Security Computer science Rank (computer programming) 02 engineering and technology Crawling Social web Task (project management) Machine Learning (cs.LG) World Wide Web 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Relevance (information retrieval) Architecture Web crawler Cryptography and Security (cs.CR) Hacker |
Zdroj: | 2019 IEEE World Congress on Services (SERVICES) |
Popis: | The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information that -given the appropriate tools and methods-may be identified, crawled and subsequently leveraged to actionable cyber-threat intelligence. In this work, we focus on the information gathering task, and present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web. The proposed architecture adopts a two-phase approach to data harvesting. Initially a machine learning-based crawler is used to direct the harvesting towards websites of interest, while in the second phase state-of-the-art statistical language modelling techniques are used to represent the harvested information in a latent low-dimensional feature space and rank it based on its potential relevance to the task at hand. The proposed architecture is realised using exclusively open-source tools, and a preliminary evaluation with crowdsourced results demonstrates its effectiveness. 6 pages, 2 figures |
Databáze: | OpenAIRE |
Externí odkaz: |