Automated Model Learning for Accurate Detection of Malicious Digital Documents

Autor:	Craig Miles, Daniel Scofield, Stephen Kuhn
Rok vydání:	2020
Předmět:	Computer Networks and Communications Computer science 0211 other engineering and technologies 02 engineering and technology Machine learning computer.software_genre Semantic equivalence 020204 information systems 0202 electrical engineering electronic engineering information engineering Randomness 021110 strategic defence & security studies Training set business.industry Computer Science Applications Hardware and Architecture Scalability Model learning Malware Anomaly detection Artificial intelligence business Safety Research computer Classifier (UML) Software Information Systems
Zdroj:	Digital Threats: Research and Practice. 1:1-21
ISSN:	2576-5337 2692-1626
Popis:	Modern cyber attacks are often conducted by distributing digital documents that contain malware. The approach detailed herein, which consists of a classifier that uses features derived from dynamic analysis of a document viewer as it renders the document in question, is capable of classifying the disposition of digital documents with greater than 98% accuracy even when its model is trained on just small amounts of data. To keep the classification model itself small and thereby to provide scalability, we employ an entity resolution strategy that merges syntactically disparate features that are thought to be semantically equivalent but vary due to programmatic randomness. Entity resolution enables construction of a comprehensive model of benign functionality using relatively few training documents, and the model does not improve significantly with additional training data. In particular, we describe and quantitatively evaluate a fully automated, document format--agnostic approach for learning a classification model that provides efficacious malicious document detection.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::d4f3bdacde6e9809b0385706d65f1d4a https://doi.org/10.1145/3379505 Zobrazit plný text záznamu