Adjacent Inputs With Different Labels and Hardness in Supervised Learning
Autor: | José Luis Vázquez Noguera, Julio César Mello Román, Federico Divina, Jorge Daniel Mello-Román, Sebastián Alberto Grillo, Miguel García-Torres, Pedro E. Gardel-Sotomayor |
---|---|
Rok vydání: | 2021 |
Předmět: |
overfitting
General Computer Science business.industry Computer science Supervised learning General Engineering Classification Machine learning computer.software_genre supervised learning TK1-9971 ComputingMethodologies_PATTERNRECOGNITION machine learning data complexity General Materials Science Electrical engineering. Electronics. Nuclear engineering Artificial intelligence business computer |
Zdroj: | IEEE Access, Vol 9, Pp 162487-162498 (2021) |
ISSN: | 2169-3536 |
DOI: | 10.1109/access.2021.3131150 |
Popis: | An important aspect of the design of effective machine learning algorithms is the complexity analysis of classification problems. In this paper, we propose a study aimed at determining the relation between the number of adjacent inputs with different labels and the required number of examples for the task of inducing a classification model. To this aim, we first quantified the adjacent inputs with different labels as a property, using a measure denoted as Neighbour Input Variation (NIV). We analyzed the relation that NIV has to random data and overfitting. We then demonstrated that a threshold of NIV may determine if a classification model can generalize to unseen data. We also presented a case study aimed at analyzing threshold neural networks and the required first hidden layer size in function of NIV. Finally, we performed experiments with five popular algorithms analyzing the relation between NIV and the classification error on problems with few dimensions. We conclude that functions whose similar inputs have different outputs with high probability, considerably reduce the generalization capacity of classification algorithms. |
Databáze: | OpenAIRE |
Externí odkaz: |