Adjacent Inputs With Different Labels and Hardness in Supervised Learning

Autor:	José Luis Vázquez Noguera, Julio César Mello Román, Federico Divina, Jorge Daniel Mello-Román, Sebastián Alberto Grillo, Miguel García-Torres, Pedro E. Gardel-Sotomayor
Rok vydání:	2021
Předmět:	overfitting General Computer Science business.industry Computer science Supervised learning General Engineering Classification Machine learning computer.software_genre supervised learning TK1-9971 ComputingMethodologies_PATTERNRECOGNITION machine learning data complexity General Materials Science Electrical engineering. Electronics. Nuclear engineering Artificial intelligence business computer
Zdroj:	IEEE Access, Vol 9, Pp 162487-162498 (2021)
ISSN:	2169-3536
DOI:	10.1109/access.2021.3131150
Popis:	An important aspect of the design of effective machine learning algorithms is the complexity analysis of classification problems. In this paper, we propose a study aimed at determining the relation between the number of adjacent inputs with different labels and the required number of examples for the task of inducing a classification model. To this aim, we first quantified the adjacent inputs with different labels as a property, using a measure denoted as Neighbour Input Variation (NIV). We analyzed the relation that NIV has to random data and overfitting. We then demonstrated that a threshold of NIV may determine if a classification model can generalize to unseen data. We also presented a case study aimed at analyzing threshold neural networks and the required first hidden layer size in function of NIV. Finally, we performed experiments with five popular algorithms analyzing the relation between NIV and the classification error on problems with few dimensions. We conclude that functions whose similar inputs have different outputs with high probability, considerably reduce the generalization capacity of classification algorithms.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f8f47bb526c9f96047579f040d3724c7 https://doi.org/10.1109/access.2021.3131150 Zobrazit plný text záznamu