Binary Code Similarity Detection Method Based on Pre-training Assembly Instruction Representation

Autor: WANG Taiyan, PAN Zulie, YU Lu, SONG Jingbin
Jazyk: čínština
Rok vydání: 2023
Předmět:
Zdroj: Jisuanji kexue, Vol 50, Iss 4, Pp 288-297 (2023)
Druh dokumentu: article
ISSN: 1002-137X
DOI: 10.11896/jsjkx.220300271
Popis: Binary code similarity detection has been widely used in vulnerability searching,malware detection,advanced program analysis and other fields in recent years,while program code is similar to natural language in a degree,researchers start to use pre-training and other natural language processing related technologies to improve accuracy.A binary code similarity detection method based on pre-training assembly instruction representation is proposed to deal with the accuracy bottleneck due to insufficient consideration of instruction probability features.It includes tokenization method for multi-arch assembly instructions,and pre-trai-ning tasks that considering control flow,data flow,instruction logic and probability of occurrence,to achieve better vectorized representation of instructions.Downstream binary code similarity detection task is improved by combining pre-training method to gain accuracy boost.Experiments show that,compared with the existing methods,the proposed method improves instruction representing performance by 23.7% at the maximum,and improves block searching ability and similarity detection performance by up to 33.97% and 400% respectively.
Databáze: Directory of Open Access Journals