Just enough semantics: An information theoretic approach for IR-based software bug localization
Autor: | Anas Mahmoud, Miroslav Tushev, Saket Khatiwada |
---|---|
Rok vydání: | 2018 |
Předmět: |
Information retrieval
Source code Exploit Computer science media_common.quotation_subject 020207 software engineering Context (language use) 02 engineering and technology Pointwise mutual information computer.software_genre Computer Science Applications Semantic similarity Software bug 020204 information systems 0202 electrical engineering electronic engineering information engineering Software system Data mining Normalized Google distance computer Software Information Systems media_common |
Zdroj: | Information and Software Technology. 93:45-57 |
ISSN: | 0950-5849 |
DOI: | 10.1016/j.infsof.2017.08.012 |
Popis: | Context Software systems are often shipped with defects. Whenever a bug is reported, developers use the information available in the associated report to locate source code fragments that need to be modified in order to fix the bug. However, as software systems evolve in size and complexity, bug localization can become a tedious and time-consuming process. To minimize the manual effort, contemporary bug localization tools utilize Information Retrieval (IR) methods for automated support. IR methods exploit the textual content of bug reports to automatically capture and rank relevant buggy source files. Objective In this paper, we propose a new paradigm of information-theoretic IR methods to support bug localization tasks in software systems. These methods, including Pointwise Mutual Information (PMI) and Normalized Google Distance (NGD), exploit the co-occurrence patterns of code terms in the software system to reveal hidden textual semantic dimensions that other methods often fail to capture. Our objective is establish accurate semantic similarity relations between source code and bug reports. Method Five benchmark datasets from different application domains are used to conduct our analysis. The proposed methods are compared against classical IR methods that are commonly used in bug localization research. Results The results show that information-theoretic IR methods significantly outperform classical IR methods, providing a semantically aware, yet, computationally efficient solution for bug localization in large and complex software systems. (A replication package is available at: http://seel.cse.lsu.edu/data/ist17.zip ). Conclusions Information-theoretic co-occurrence methods provide “just enough semantics” necessary to establish relations between bug reports and code artifacts, achieving a balance between simple lexical methods and computationally-expensive semantic IR methods that require substantial amounts of data to function properly. |
Databáze: | OpenAIRE |
Externí odkaz: |