Autor: |
Hernandez-Suarez A; Instituto Politecnico Nacional, ESIME Culhuacan, Mexico City 04440, Mexico., Sanchez-Perez G; Instituto Politecnico Nacional, ESIME Culhuacan, Mexico City 04440, Mexico., Toscano-Medina LK; Instituto Politecnico Nacional, ESIME Culhuacan, Mexico City 04440, Mexico., Perez-Meana H; Instituto Politecnico Nacional, ESIME Culhuacan, Mexico City 04440, Mexico., Olivares-Mercado J; Instituto Politecnico Nacional, ESIME Culhuacan, Mexico City 04440, Mexico., Portillo-Portillo J; Instituto Politecnico Nacional, ESIME Culhuacan, Mexico City 04440, Mexico., Benitez-Garcia G; Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan., Sandoval Orozco AL; Group of Analysis, Security and Systems (GASS), Department of Software Engineering and Artificial Intelligence (DISIA), Faculty of Computer Science and Engineering, Office 431, Universidad Complutense de Madrid (UCM), 28040 Madrid, Spain., García Villalba LJ; Group of Analysis, Security and Systems (GASS), Department of Software Engineering and Artificial Intelligence (DISIA), Faculty of Computer Science and Engineering, Office 431, Universidad Complutense de Madrid (UCM), 28040 Madrid, Spain. |
Abstrakt: |
In recent years, cybersecurity has been strengthened through the adoption of processes, mechanisms and rapid sources of indicators of compromise in critical areas. Among the most latent challenges are the detection, classification and eradication of malware and Denial of Service Cyber-Attacks (DoS). The literature has presented different ways to obtain and evaluate malware- and DoS-cyber-attack-related instances, either from a technical point of view or by offering ready-to-use datasets. However, acquiring fresh, up-to-date samples requires an arduous process of exploration, sandbox configuration and mass storage, which may ultimately result in an unbalanced or under-represented set. Synthetic sample generation has shown that the cost associated with setting up controlled environments and time spent on sample evaluation can be reduced. Nevertheless, the process is performed when the observations already belong to a characterized set, totally detached from a real environment. In order to solve the aforementioned, this work proposes a methodology for the generation of synthetic samples of malicious Portable Executable binaries and DoS cyber-attacks. The task is performed via a Reinforcement Learning engine, which learns from a baseline of different malware families and DoS cyber-attack network properties, resulting in new, mutated and highly functional samples. Experimental results demonstrate the high adaptability of the outputs as new input datasets for different Machine Learning algorithms. |