Prompt Engineering or Fine-Tuning? A Case Study on Phishing Detection with Large Language Models

Autor:	Fouad Trad, Ali Chehab
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	large language models prompt engineering fine-tuning phishing detection Computer engineering. Computer hardware TK7885-7895
Zdroj:	Machine Learning and Knowledge Extraction, Vol 6, Iss 1, Pp 367-384 (2024)
Druh dokumentu:	article
ISSN:	2504-4990
DOI:	10.3390/make6010018
Popis:	Large Language Models (LLMs) are reshaping the landscape of Machine Learning (ML) application development. The emergence of versatile LLMs capable of undertaking a wide array of tasks has reduced the necessity for intensive human involvement in training and maintaining ML models. Despite these advancements, a pivotal question emerges: can these generalized models negate the need for task-specific models? This study addresses this question by comparing the effectiveness of LLMs in detecting phishing URLs when utilized with prompt-engineering techniques versus when fine-tuned. Notably, we explore multiple prompt-engineering strategies for phishing URL detection and apply them to two chat models, GPT-3.5-turbo and Claude 2. In this context, the maximum result achieved was an F1-score of 92.74% by using a test set of 1000 samples. Following this, we fine-tune a range of base LLMs, including GPT-2, Bloom, Baby LLaMA, and DistilGPT-2—all primarily developed for text generation—exclusively for phishing URL detection. The fine-tuning approach culminated in a peak performance, achieving an F1-score of 97.29% and an AUC of 99.56% on the same test set, thereby outperforming existing state-of-the-art methods. These results highlight that while LLMs harnessed through prompt engineering can expedite application development processes, achieving a decent performance, they are not as effective as dedicated, task-specific LLMs.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/1a81b1476f984680a4e8fe1eff7a4d31 Zobrazit plný text záznamu View record in DOAJ