A Method to Evaluate Program Similarity Using Machine Learning Methods
Autor: | Petr Borisov, Yury Kosolapov |
---|---|
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | Proceedings of the Institute for System Programming of the RAS. 34:63-76 |
ISSN: | 2220-6426 2079-8156 |
Popis: | The problem of constructing an algorithm for comparing two executable files is considered. The algorithm is based on the construction of similarity features vector for a given pair of programs. This vector is then used to decide on the similarity or dissimilarity of programs using machine learning methods. Similarity features are built using algorithms of two types: universal and specialized. Universal algorithms do not take into account the format of the input data (values of fuzzy hash functions, values of compression ratios). Specialized algorithms work with executable files and analyze machine code (using disassemblers). A total of 15 features were built: 9 features of the first type and 6 of the second. Based on the constructed training set of similar and dissimilar program pairs, 7 different binary classifiers were trained and tested. To build the training set, coreutils programs were used. The results of the experiments showed high accuracy of models based on random forest and k nearest neighbors. It was also found that the combined use of features of both types can improve the accuracy of classification. |
Databáze: | OpenAIRE |
Externí odkaz: |