ApiScout: Robust Windows API Usage Recovery for Malware Characterization and Similarity Analysis

Autor: Plohmann, Daniel, Enders, Steffen, Padilla, Elmar
Jazyk: angličtina
Rok vydání: 2018
DOI: 10.18464/cybin.v4i1.20
Popis: Given today's masses of malware there is a need for fast analysis and comparison of samples. System API usage has been proven to be a very valuable source of information for this e.g. shown by Rieck et al. However, the majority of malware samples is shipped packed, making it hard to get accurate information on their payload's API usage. Today's state of the art to get this information from packed samples is by unpacking them or dumping memory with subsequent reconstruction of imports using tools like ImpREC and Scylla. This has several drawbacks since it is a manual procedure requiring a live process environment and suffers from inaccuracy due to missed dynamic imports. In this paper, we present ApiScout, a fully automated method to recover API usage information from memory dumps. It does not require a live process environment and is capable of handling dynamic imports leading to more accurate results compared to existing approaches. ApiScout is a two-staged approach. The first stage is a preparation step creating a database of candidate offsets for API functions. In the second step we crawl through a given memory dump of a process and match all possible DWORDs and QWORDs against this database yielding us API reference candidates. We filter and enrich candidates using different procedures leading us to the desired API usage information. Based on this information, our second contribution in this paper is a concept called ApiVectors. It efficiently stores the information extracted by ApiScout. This enables fast assessment of a malware's potential capabilities and allows similarity analysis of API usage across samples. For the latter the methods imphash and impfuzzy are the de facto standard. However, they both suffer from inaccuracy due to exclusively relying on the import table and non-recoverability of input data. In our approach we use Jaccard and Tanimoto similarity to compare ApiVectors, leading to a much higher accuracy. Our third contribution is an extensive analysis of API usage across 589 malware families of the Malpedia dataset. The families combined use only about 4500 APIs that can be grouped into 12 semantic groups. The analysis further proves the functionality of ApiScout and shows that ApiVectors clearly outperform imphash and impfuzzy.
The Journal on Cybercrime & Digital Investigations, Vol 4 No 1 (2018): Botconf 2018
Databáze: OpenAIRE