Impact of Dataset Representation on Smartphone Malware Detection Performance

Autor: Jean-Marc Robert, Chamseddine Talhi, Abdelfattah Amamra
Rok vydání: 2013
Předmět:
Zdroj: Trust Management VII ISBN: 9783642383229
IFIPTM
DOI: 10.1007/978-3-642-38323-6_12
Popis: Improving Smartphone anomaly-based malware detection techniques is widely studied in recent years. Previous studies explore three factors: dataset size, dataset type and normal profile model. These factors improve the performance, but increase computation complexity and the required memory space. In this paper we explore a new factor: the dataset representation. Dataset representation is the format adopted to organize and represent data. To investigate the impact of this factor, we examine four machine learning classifiers with three different dataset representations. Those dataset representations are: successive system calls, bag of system calls and patterns frequency system calls. The used dataset is a collection of system call traces of Smartphone executing Android 2.2. We analyse the performance of each classifier and deduce the influence of dataset representation on accuracy and false positive rates. The results show that the dataset representation has a potential impact on the performance of classifiers with low computational and memory cost.
Databáze: OpenAIRE