Can We Generate Shellcodes via Natural Language? An Empirical Study

Autor:	Pietro Liguori, Erfan Al-Hossami, Domenico Cotroneo, Roberto Natella, Bojan Cukic, Samira Shaikh
Přispěvatelé:	Liguori, P., Al-Hossami, E., Cotroneo, D., Natella, R., Cukic, B., Shaikh, S.
Rok vydání:	2022
Předmět:	Automatic exploit generation Software Engineering (cs.SE) FOS: Computer and information sciences Computer Science - Software Engineering Computer Science - Machine Learning Software exploits Neural machine translation Assembly Shellcode Software Machine Learning (cs.LG)
DOI:	10.48550/arxiv.2202.03755
Popis:	Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset (Shellcode_IA32), which consists of 3,200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors. Comment: 33 pages, 5 figures, 9 tables. To be published in Automated Software Engineering journal
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4ed3ea24feaec2f5c7514d4137fa3daf Zobrazit plný text záznamu