Malware Detection Inside App Stores Based on Lifespan Measurements

Autor: Carlos Cilleruelo, Enrique-Larriba, Luis De-Marcos, Jose-Javier Martinez-Herraiz
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: IEEE Access, Vol 9, Pp 119967-119976 (2021)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2021.3107903
Popis: Potentially Harmful Apps (PHAs), like any other type of malware, are a problem. Even though Google tries to maintain a clean app ecosystem, Google Play Store is still one of the main vectors for spreading PHAs. In this paper, we propose a solution based on machine learning algorithms to detect PHAs inside application markets. Being the application markets one of the main entry vectors, a solution capable of detecting PHAs submitted or in submission to those markets is needed. This solution is capable of detecting PHAs inside an application market and can be used as a filtering method, to automatically block the publishing of novel PHAs. The proposed solution is based on application static analysis, and even though several static analysis solutions have been developed, the innovation of this system is based on its training and the creation of its dataset. We have created a new dataset that uses as criteria the lifespan of an application inside Google Play, the shorter time an application is active inside an application market the higher the probability that this is a PHA. This criterion was added in order to avoid the usage and bias of antivirus engines for detecting malware. Involving the lifespan as criteria we created a new method of detection that does not replicate any existing antivirus engines. Experimental results have proved that this solution obtains a 90% accuracy score, using a dataset of 91,203 applications published on the Google Play Store. Despite showing a decrease in accuracy, compared with other machine learning models focused on detecting PHAs; it is necessary to take into account that this is a complementary and different method. The presented work can be combined with other static and dynamic machine learning models, since its training is drastically different, as it was based on lifespan measurements.
Databáze: Directory of Open Access Journals