Don’t Stop Believin’: A Unified Evaluation Approach for LLM Honeypots

Autor:	Simon B. Weber, Marc Feger, Michael Pilgermann
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	IT security honeypot large language model GPT cosine distance evaluation Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 144579-144587 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3472460
Popis:	The research area of honeypots is gaining new momentum, driven by advancements in large language models (LLMs). The chat-based applications of generative pretrained transformer (GPT) models seem ideal for the use as honeypot backends, especially in request-response protocols like Secure Shell (SSH). By leveraging LLMs, many challenges associated with traditional honeypots – such as high development costs, ease of exposure, and breakout risks – appear to be solved. While early studies have primarily focused on the potential of these models, our research investigates the current limitations of GPT-3.5 by analyzing three datasets of varying complexity. We conducted an expert annotation of over 1,400 request-response pairs, encompassing 230 different base commands. Our findings reveal that while GPT-3.5 struggles to maintain context, incorporating session context into response generation improves the quality of SSH responses. Additionally, we explored whether distinguishing between convincing and non-convincing responses is a metrics issue. We propose a paraphrase-mining approach to address this challenge, which achieved a macro F1 score of 77.85% using cosine distance in our evaluation. This method has the potential to reduce annotation efforts, converge LLM-based honeypot performance evaluation, and facilitate comparisons between new and previous approaches in future research.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/17434d89860d419993a4fb5d27a65e1d Zobrazit plný text záznamu View record in DOAJ