Noisy Token Removal for Bug Localization: The Impact of Semantically Confusing Misguiding Terms

Autor: Youngkyoung Kim, Misoo Kim, Eunseok Lee
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: IEEE Access, Vol 12, Pp 172396-172409 (2024)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2024.3500367
Popis: A bug report is a technical document describing bugs that have occurred in the software. Finding the source code files to resolve a reported bug is a laborious task. To automate this process, information retrieval-based bug localization (IRBL) techniques have been proposed. These techniques assess the relevance between the bug report and source files, providing developers with a ranked list of source files. They rely heavily on text tokens, making it essential to remove noisy tokens from the input tokens. To address the problem of prevalent noisy tokens deteriorating IRBL performance, we define impactful noisy words as misguiding terms and investigate their prevalence and impact. We employed a deep learning model combined with explainable AI techniques to detect misguiding terms, leveraging their semantic embedding capabilities. We conducted extensive experiments on 24 open-source software projects and three IRBL models. By removing misguiding terms, the mean reciprocal rank of bug localization improved by 19%, 17%, and 27% for three models on average and up to 120%. The proposed approach effectively distinguishes between beneficial terms and noise, leading to superior IRBL performance compared to the existing noise detection approaches, with consistent improvements observed across 24 projects.
Databáze: Directory of Open Access Journals