HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing

Autor:	Muzsai, Lajos, Imolai, David, Lukács, András
Rok vydání:	2024
Předmět:	Computer Science - Cryptography and Security Computer Science - Artificial Intelligence 68M25 I.2.1 K.6.5
Druh dokumentu:	Working Paper
Popis:	We introduce HackSynth, a novel Large Language Model (LLM)-based agent capable of autonomous penetration testing. HackSynth's dual-module architecture includes a Planner and a Summarizer, which enable it to generate commands and process feedback iteratively. To benchmark HackSynth, we propose two new Capture The Flag (CTF)-based benchmark sets utilizing the popular platforms PicoCTF and OverTheWire. These benchmarks include two hundred challenges across diverse domains and difficulties, providing a standardized framework for evaluating LLM-based penetration testing agents. Based on these benchmarks, extensive experiments are presented, analyzing the core parameters of HackSynth, including creativity (temperature and top-p) and token utilization. Multiple open source and proprietary LLMs were used to measure the agent's capabilities. The experiments show that the agent performed best with the GPT-4o model, better than what the GPT-4o's system card suggests. We also discuss the safety and predictability of HackSynth's actions. Our findings indicate the potential of LLM-based agents in advancing autonomous penetration testing and the importance of robust safeguards. HackSynth and the benchmarks are publicly available to foster research on autonomous cybersecurity solutions. Comment: 16 pages, 9 figures
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2412.01778 Zobrazit plný text záznamu View this record from Arxiv