An efficient dynamic superset bit-vector approach for mining frequent closed itemsets and their lattice structure

Autor: Md. Samiullah, Chowdhury Farhan Ahmed, Md. Rezaul Karim, Tahrima Hashem
Rok vydání: 2017
Předmět:
Zdroj: Expert Systems with Applications. 67:252-271
ISSN: 0957-4174
Popis: Efficient approach to mine frequent closed itemsets and their lattice structure.We propose a new memory efficient dynamic superset bit-vector (DSBV) structure.New candidate pruning techniques are developed.DSBV establishes hierarchical subset-superset relationship of itemset lattice.Extensive experiments to show scalability and supremacy of our proposed approach. Fast discovery of association rules from millions of transactions in a variety of large databases has now become a major challenge in data mining domain. Frequent itemsets and frequent closed itemsets are key sources of mining association rules. Association rules can be mined efficiently from lattice of itemsets. Non-redundancy, accuracy, time efficiency and memory usage are the factors those need to be considered while developing algorithms in order to extract meaningful association rules. In this paper, we propose an efficient bit-vector approach that exploits dual properties, tidset and superset information of an itemset in order to mine frequent closed itemsets with their lattice structure. We introduce a new memory efficient data structure called dynamic superset bit-vector to establish the relationship among frequent closed itemsets in a lattice. The novelty of our approach is that it effectively uses dual data structures called a dynamic bit-vector and a dynamic superset bit-vector jointly in order to reduce the search space and eliminate the generators of non-closed itemsets. Our proposed approach efficiently builds up the subset-superset relationship among the closed itemsets of the lattice structure in a bottom-up manner. Extensive experiments using real-life datasets and pervasive performance comparison with the existing works prove the efficiency and scalability of our proposed approach.
Databáze: OpenAIRE