Popis: |
A bug report is a technical document describing bugs that have occurred in the software. Finding the source code files to resolve a reported bug is a laborious task. To automate this process, information retrieval-based bug localization (IRBL) techniques have been proposed. These techniques assess the relevance between the bug report and source files, providing developers with a ranked list of source files. They rely heavily on text tokens, making it essential to remove noisy tokens from the input tokens. To address the problem of prevalent noisy tokens deteriorating IRBL performance, we define impactful noisy words as misguiding terms and investigate their prevalence and impact. We employed a deep learning model combined with explainable AI techniques to detect misguiding terms, leveraging their semantic embedding capabilities. We conducted extensive experiments on 24 open-source software projects and three IRBL models. By removing misguiding terms, the mean reciprocal rank of bug localization improved by 19%, 17%, and 27% for three models on average and up to 120%. The proposed approach effectively distinguishes between beneficial terms and noise, leading to superior IRBL performance compared to the existing noise detection approaches, with consistent improvements observed across 24 projects. |