Bayesian Hyper-Parameter Optimisation for Malware Detection

Autor: Fahad T. ALGorain, John A. Clark
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Electronics; Volume 11; Issue 10; Pages: 1640
ISSN: 2079-9292
DOI: 10.3390/electronics11101640
Popis: Malware detection is a major security concern and has been the subject of a great deal of research and development. Machine learning is a natural technology for addressing malware detection, and many researchers have investigated its use. However, the performance of machine learning algorithms often depends significantly on parametric choices, so the question arises as to what parameter choices are optimal. In this paper, we investigate how best to tune the parameters of machine learning algorithms—a process generally known as hyper-parameter optimisation—in the context of malware detection. We examine the effects of some simple (model-free) ways of parameter tuning together with a state-of-the-art Bayesian model-building approach. Our work is carried out using Ember, a major published malware benchmark dataset of Windows Portable Execution metadata samples, and a smaller dataset from kaggle.com (also comprising Windows Portable Execution metadata). We demonstrate that optimal parameter choices may differ significantly from default choices and argue that hyper-parameter optimisation should be adopted as a ‘formal outer loop’ in the research and development of malware detection systems. We also argue that doing so is essential for the development of the discipline since it facilitates a fair comparison of competing machine learning algorithms applied to the malware detection problem.
Databáze: OpenAIRE