Machine Learning Techniques for Code Smells Detection: A Systematic Mapping Study

Autor:	Amadeu Silveira Campanelli, Fernando Silva Parreiras, Frederico Luiz Caram, Bruno Rafael de Oliveira Rodrigues
Rok vydání:	2019
Předmět:	Source code Computer Networks and Communications business.industry Computer science Interpretation (philosophy) media_common.quotation_subject 010102 general mathematics Code smell 020207 software engineering 02 engineering and technology computer.software_genre 01 natural sciences Computer Graphics and Computer-Aided Design Code refactoring Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Artificial intelligence InformationSystems_MISCELLANEOUS 0101 mathematics Systematic mapping business computer Software Natural language processing media_common
Zdroj:	International Journal of Software Engineering and Knowledge Engineering. 29:285-316
ISSN:	1793-6403 0218-1940
Popis:	Code smells or bad smells are an accepted approach to identify design flaws in the source code. Although it has been explored by researchers, the interpretation of programmers is rather subjective. One way to deal with this subjectivity is to use machine learning techniques. This paper provides the reader with an overview of machine learning techniques and code smells found in the literature, aiming at determining which methods and practices are used when applying machine learning for code smells identification and which machine learning techniques have been used for code smells identification. A mapping study was used to identify the techniques used for each smell. We found that the Bloaters was the main kind of smell studied, addressed by 35% of the papers. The most commonly used technique was Genetic Algorithms (GA), used by 22.22% of the papers. Regarding the smells addressed by each technique, there was a high level of redundancy, in a way that the smells are covered by a wide range of algorithms. Nevertheless, Feature Envy stood out, being targeted by 63% of the techniques. When it comes to performance, the best average was provided by Decision Tree, followed by Random Forest, Semi-supervised and Support Vector Machine Classifier techniques. 5 out of the 25 analyzed smells were not handled by any machine learning techniques. Most of them focus on several code smells and in general there is no outperforming technique, except for a few specific smells. We also found a lack of comparable results due to the heterogeneity of the data sources and of the provided results. We recommend the pursuit of further empirical studies to assess the performance of these techniques in a standardized dataset to improve the comparison reliability and replicability.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::d2d7b6165b3062e194cc66cf41be8632 https://doi.org/10.1142/s021819401950013x Zobrazit plný text záznamu