Research on multi-granularity password analysis based on LLM

Autor:	Meng HONG, Weidong QIU, Yangde WANG
Jazyk:	English<br />Chinese
Rok vydání:	2024
Předmět:	large language model password analysis natural language processing word segmentation Electronic computers. Computer science QA75.5-76.95
Zdroj:	网络与信息安全学报, Vol 10, Pp 112-122 (2024)
Druh dokumentu:	article
ISSN:	2096-109X
DOI:	10.11959/j.issn.2096-109x.2024008
Popis:	Password-based authentication has been widely used as the primary authentication mechanism.However, occasional large-scale password leaks have highlighted the vulnerability of passwords to risks such as guessing or theft.In recent years, research on password analysis using natural language processing techniques has progressed, treating passwords as a special form of natural language.Nevertheless, limited studies have investigated the impact of password text segmentation granularity on the effectiveness of password analysis with large language models.A multi-granularity password-analyzing framework was proposed based on a large language model, which follows the pre-training paradigm and autonomously learns prior knowledge of password distribution from large unlabelled datasets.The framework comprised three modules: the synchronization network, backbone network, and tail network.The synchronization network module implemented char-level, template-level, and chunk-level password segmentation, extracting knowledge on character distribution, structure, word chunk composition, and other password features.The backbone network module constructed a generic password model to learn the rules governing password composition.The tail network module generated candidate passwords for guessing and analyzing target databases.Experimental evaluations were conducted on eight password databases including Tianya and Twitter, analyzing and summarizing the effectiveness of the proposed framework under different language environments and word segmentation granularities.The results indicate that in Chinese user scenarios, the performance of the password-analyzing framework based on char-level and chunk-level segmentation is comparable, and significantly superior to the framework based on template-level segmentation.In English user scenarios, the framework based on chunk-level segmentation demonstrates the best password-analyzing performance.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/91be6021ba544df5aa0761a63e2632fc Zobrazit plný text záznamu View record in DOAJ