An efficient closed frequent itemset miner for the MOA stream mining system
Autor: | Quadrana, Massimo, Bifet Figuerol, Albert Carles, Gavaldà Mestre, Ricard|||0000-0003-4736-7179 |
---|---|
Přispěvatelé: | Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge |
Jazyk: | angličtina |
Rok vydání: | 2013 |
Předmět: | |
Zdroj: | Recercat. Dipósit de la Recerca de Catalunya instname UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC) |
Popis: | Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift. |
Databáze: | OpenAIRE |
Externí odkaz: |