Understanding and Detecting Harmful Code
Autor: | Leopoldo Teixeira, Márcio Garcia Ribeiro, Alessandro Garcia, Baldoino Fonseca, Jairo Francisco de Souza, R.P.A. Lima, Rafael Maiani de Mello, Rohit Gheyi |
---|---|
Rok vydání: | 2020 |
Předmět: |
Focus (computing)
business.industry Computer science Code smell 020207 software engineering Context (language use) 02 engineering and technology Software quality Naive Bayes classifier Software Increased risk 0202 electrical engineering electronic engineering information engineering Code (cryptography) 020201 artificial intelligence & image processing Software engineering business |
Zdroj: | SBES |
DOI: | 10.1145/3422392.3422420 |
Popis: | Code smells typically indicate poor design implementation and choices that may degrade software quality. Hence, they need to be carefully detected to avoid such poor design. In this context, some studies try to understand the impact of code smells on the software quality, while others propose rules or machine learning-based techniques to detect code smells. However, none of those studies or techniques focus on analyzing code snippets that are really harmful to software quality. This paper presents a study to understand and classify code harmfulness. We analyze harmfulness in terms of CLEAN, SMELLY, BUGGY, and HARMFUL code. By HARMFUL CODE, we define a SMELLY code element having one or more bugs reported. These bugs may have been fixed or not. Thus, the incidence of HARMFUL CODE may represent a increased risk of introducing new defects and/or design problems during its fixing. We perform our study with 22 smell types, 803 versions of 13 open-source projects, 40,340 bugs and 132,219 code smells. The results show that even though we have a high number of code smells, only 0.07% of those smells are harmful. The Abstract Function Call From Constructor is the smell type more related to HARMFUL CODE. To cross-validate our results, we also perform a survey with 60 developers. Most of them (98%) consider code smells harmful to the software, and 85% of those developers believe that code smells detection tools are important. But, those developers are not concerned about selecting tools that are able to detect HARMFUL CODE. We also evaluate machine learning techniques to classify code harmfulness: they reach the effectiveness of at least 97% to classify HARMFUL CODE. While the Random Forest is effective in classifying both SMELLY and HARMFUL CODE, the Gaussian Naive Bayes is the less effective technique. Our results also suggest that both software and developers' metrics are important to classify HARMFUL CODE. |
Databáze: | OpenAIRE |
Externí odkaz: |