Autor: |
Stéphane Lallich, Djamel A. Zighed, Fabien Rico, Fabrice Muhlenbach |
Rok vydání: |
2015 |
Předmět: |
|
Zdroj: |
Neurocomputing. 160:3-17 |
ISSN: |
0925-2312 |
Popis: |
This paper focuses on the detection of likely mislabeled instances in a learning dataset. In order to detect potentially mislabeled samples, two solutions are considered which are both based on the same framework of topological graphs. The first is a statistical approach based on Cut Edges Weighted statistics (CEW) in the neighborhood graph. The second solution is a Relaxation Technique (RT) that optimizes a local criterion in the neighborhood graph. The evaluations by ROC curves show good results since almost 90% of the mislabeled instances are retrieved for a cost of less than 20% of false positive. The removal of samples detected as mislabeled by our approaches generally leads to an improvement of the performances of classical machine learning algorithms. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|