A Relaxation-based Approach for Mining Diverse Closed Patterns
Autor: | Noureddine Aribi, Abdelkader Ouali, Yahia Lebbah, Samir Loudni, Arnold Hien, Albrecht Zimmermann, Mohammed El Amine Laghzaoui |
---|---|
Přispěvatelé: | Equipe CODAG - Laboratoire GREYC - UMR6072, Groupe de Recherche en Informatique, Image et Instrumentation de Caen (GREYC), Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN), Normandie Université (NU)-Normandie Université (NU)-Université de Caen Normandie (UNICAEN), Normandie Université (NU)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN), Normandie Université (NU), Département Automatique, Productique et Informatique (IMT Atlantique - DAPI), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Théorie, Algorithmes et Systèmes en Contraintes (TASC ), Laboratoire des Sciences du Numérique de Nantes (LS2N), Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Laboratoire d'Informatique et Technologies de l'Information d'Oran (LITIO), Université d'Oran Al-Sanya, Frank Hutter, Kristian Kersting, Jefrey Lijffijt, Isabel Valera |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Data mining DM
Theoretical computer science Boosting (machine learning) Jaccard index Computer science business.industry Monotonic function 02 engineering and technology Upper and lower bounds [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] 020204 information systems Diversity Constraints 0202 electrical engineering electronic engineering information engineering Constraint programming 020201 artificial intelligence & image processing business Agile software development |
Zdroj: | Machine Learning and Knowledge Discovery in Databases-European Conference, {ECML} {PKDD} 2020 Machine Learning and Knowledge Discovery in Databases-European Conference, 2020, Sep 2020, Ghent (virtual), Belgium. pp.36--54, ⟨10.1007/978-3-030-67658-2_3⟩ HAL European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2020 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2020, Sep 2020, Gand, Belgium Machine Learning and Knowledge Discovery in Databases ISBN: 9783030676575 ECML/PKDD (1) |
Popis: | International audience; In recent years, pattern mining has moved from a slow-moving repeated three-step process to a much more agile iterative/user-centric mining model. A vital ingredient of this framework is the ability to quickly present a set of diverse patterns to the user. In this paper, we use constraint programming (well-suited to user-centric mining due to its rich constraint language) to efficiently mine a diverse set of closed patterns. Diversity is controlled through a threshold on the Jaccard similarity of pattern occurrences. We show that the Jaccard measure has no monotonicity property, which prevents usual pruning techniques and makes classical pattern mining unworkable. This is why we propose anti-monotonic lower and upper bound relaxations, which allow effective pruning, with an efficient branching rule, boosting the whole search process. We show experimentally that our approach significantly reduces the number of patterns and is very efficient in terms of running times, particularly on dense data sets. |
Databáze: | OpenAIRE |
Externí odkaz: |