Constraint-Based Mining of Sequential Patterns over Datasets with Consecutive Repetitions

Autor:	Marion Leleu, Guillaume Euvrard, Christophe Rigotti, Jean-François Boulicaut
Přispěvatelé:	Lavrač, Nada, Gamberger, Dragan, Todorovski, Ljupčo, Blockeel, Hendrik, Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2), Caisse des Dépôts et Consignations (CDC)
Předmět:	Sequence Discretization Computer science business.industry Process (computing) 02 engineering and technology Constraint satisfaction computer.software_genre Machine learning Constraint (information theory) 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing [INFO]Computer Science [cs] Data mining Artificial intelligence business computer
Zdroj:	7th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD'03 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD'03, Sep 2003, Cavtat-Dubrovnik, Croatia. pp.303-314, ⟨10.1007/978-3-540-39804-2_28⟩ Knowledge Discovery in Databases: PKDD 2003 ISBN: 9783540200857 PKDD
DOI:	10.1007/978-3-540-39804-2_28⟩
Popis:	International audience; Constraint-based mining of sequential patterns is an active research area motivated by many application domains. In practice, the real sequence datasets can present consecutive repetitions of symbols (e.g., DNA sequences, discretized stock market data) that can lead to a very important consumption of resources during the extraction of patterns that can turn even efficient algorithms to become unusable. We propose a constraint-based mining algorithm using an approach that enables to compact these consecutive repetitions, reducing drastically the amount of data to process and speeding-up the extraction time. The technique introduced in this paper allows to retain the advantages of existing state-of-the-art algorithms based on the notion of occurrence lists, while permitting to extend their application fields to datasets containing consecutive repetitions. We analyze the benefits obtained using synthetic datasets, and show that the approach is of practical interest on real datasets.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::03be8ccce68a5dbd5219b13f268eb7e7 https://infoscience.epfl.ch/record/230373 Zobrazit plný text záznamu