ADESIT : Visualisez les Limites de vos Données pour l'Apprentissage Supervisé
Autor: | Pierre Faure--giovagnoli, Marie Le Guilly, Vasile-Marian Scuturici, Jean-Marc Petit |
---|---|
Přispěvatelé: | Compagnie Nationale du Rhône (CNR), Base de Données (BD), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL) |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]
Process (engineering) Computer science business.industry media_common.quotation_subject SIGNAL (programming language) Supervised learning Frame (networking) General Engineering Machine learning computer.software_genre Upper and lower bounds [STAT.ML]Statistics [stat]/Machine Learning [stat.ML] Go/no go Quality (business) Artificial intelligence [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] business computer Counterexample media_common |
Zdroj: | VLDB 2021: 47thInternational Conference on Very Large Data Bases proceedings International Conference on Very Large Data Bases International Conference on Very Large Data Bases, Aug 2021, Copenhague, Denmark HAL |
Popis: | Thanks to the numerous machine learning tools available to us nowadays, it is easier than ever to derive a model from a dataset in the frame of a supervised learning problem. However, when this model behaves poorly compared with an expected performance, the underlying question of the existence of such a model is often underlooked and one might just be tempted to try different parameters or just choose another model architecture. This is why the quality of the learning examples should be considered as early as possible as it acts as a go/no go signal for the following potentially costly learning process. With ADESIT, we provide a way to evaluate the ability of a dataset to perform well for a given supervised learning problem through statistics and visual exploration. Notably, we base our work on recent studies proposing the use of functional dependencies and specifically counterexample analysis to provide dataset cleanliness statistics but also a theoretical upper bound on the prediction accuracy directly linked to the problem settings (measurement uncertainty, expected generalization...). In brief, ADESIT is intended to be part of an iterative data refinement process right after data selection and right before the machine learning process itself. With further analysis for a given problem, the user can characterize, clean and export dynamically selected subsets, allowing to better understand what regions of the data could be refined and where the data precision must be improved by using, for example, new or more precise sensors. |
Databáze: | OpenAIRE |
Externí odkaz: |