Unsupervised Log Sequence Segmentation

Autor:	Wojciech Dobrowolski, Mikolj Libura, Maciej Nikodem, Olgierd Unold
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Automated log analysis language abstraction unsupervised sequence segmentation software log segmentation natural language processing problem-solving Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 79003-79013 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3409425
Popis:	The log sequence is often referred to as a language in automated log analysis. The natural consequence of this is that the log sequence should have a structure consisting of words and sentences. However, the word definitions in the log sequence are not uniform in the literature. The first approach splits line-by-line, and the second retrieves word-like structures from the log sequence. The main challenge in the second approach is the measurement of results. There are approaches for constructing unsupervised metrics; however, we found them to be inconsistent. Other methods rely on manually prepared golden standards; however, a benchmark for golden segmentation is not available for any set of logs. To overcome this problem, we created a benchmark of preprocessed log event IDs gathered from the open-source CloudStack log and commercial Nokia software execution. We created a gold segmentation standard with the help of a human expert, and made it publicly available. We then tested known unsupervised segmentation methods used for log sequence segmentation and adapted the Nested Pitman-Yor Language Model. We found that the results of log segmentation performed by these methods vary significantly between the natural language domain and the log domain. VotingExperts achieved the best F-score, recording 97.3% for CloudStack and 44.1% for Nokia logs. The results are related to the uni-gram entropy of the log sequence, which differs across software platforms.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/309f87c696804529992c17c90e92c9a0 Zobrazit plný text záznamu View record in DOAJ