Hybrid Parallel Frequent Itemsets Mining Algorithm by Using N-List Structure

Autor: LIU Weiming, ZHANG Chi, MAO Yimin
Jazyk: čínština
Rok vydání: 2022
Předmět:
Zdroj: Jisuanji kexue yu tansuo, Vol 16, Iss 1, Pp 120-136 (2022)
Druh dokumentu: article
ISSN: 1673-9418
DOI: 10.3778/j.issn.1673-9418.2008068
Popis: Aiming at the problem of unbalanced load, bad efficiency of N-list merge and redundant search for each node based on parallel frequent itemset mining algorithm MRPrePost (parallel PrePost algorithm based on MapReduce), this paper proposes a hybrid parallel frequent itemset mining algorithm based on N-list (HP-FIMBN). Firstly, a load estimation function (LE) is designed to calculate the load of each item in F-list, and a grouping method based on greedy strategy (GM-GS) is proposed, which not only deals with the problem of unbalanced load in the process of data partitioning, but also decreases the size of sub-PPC-tree on local node. Secondly, in order to accelerate the completion of the merging of two N-list structures, it designs an early abandon strategy (EAS), which can not only efficiently avoid invalid calculations in the merging process, but also does not need to traverse the initial N-list structure to get the final result. Finally, the set-enumeration tree is adopted as the search space, a superset equivalent strategy (SES) is proposed to avoid redundant search in the mining process and the final mining results are generated. Experimental results show that the modified algorithm has better performance on mining frequent itemsets in big data environment.
Databáze: Directory of Open Access Journals