A formal series-based unification of the frequent itemset mining approaches
Autor: | Hadda Cherroun, Djelloul Ziadi, Slimane Oulad-Naoui |
---|---|
Rok vydání: | 2017 |
Předmět: |
Theoretical computer science
Unification Computer science Computation 02 engineering and technology Distinctive feature ENCODE Semiring Automaton Human-Computer Interaction Artificial Intelligence Hardware and Architecture 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Database transaction Implementation Software Information Systems |
Zdroj: | Knowledge and Information Systems. 53:439-477 |
ISSN: | 0219-3116 0219-1377 |
Popis: | Over the last two decades, a great deal of work has been devoted to the algorithmic aspects of the frequent itemset (FI) mining problem, leading to a phenomenal number of algorithms and associated implementations, each of which claims supremacy. Meanwhile, it is generally well agreed that developing a unifying theory is one of the most important issues in data mining research. Hence, our primary motivation for this work is to introduce a high-level formalism for this basic problem, which induces a unified vision of the algorithmic approaches presented so far. The key distinctive feature of the introduced model is that it combines, in one fashion, both the qualitative and the quantitative aspects of this basic problem. In this paper, we propose a new model for the FI-mining task based on formal series. In fact, we encode the itemsets as words over a sorted alphabet and express this problem by a formal series over the counting semiring $$(\mathbb N,+,\times ,0,1)$$(N,+,×,0,1), whose range represents the itemsets, and the coefficients are their supports. The aim is threefold: First, to define a clear, unified and extensible theoretical framework through which we can state the main FI-approaches. Second, to prove a convenient connection between the determinization of the acyclic weighted automaton that represents a transaction dataset and the computation of the associated collection of FI. Finally, to devise a first algorithmic transcription, baptized Wafi, of our model by means of weighted automata, which we evaluate against representative leading algorithms. The obtained results show the suitability of our formalism. |
Databáze: | OpenAIRE |
Externí odkaz: |