Reference itemsets: useful itemsets to approximate the representation of frequent itemsets.

Autor: Huang, Jheng-Nan, Hong, Tzung-Pei, Chiang, Ming-Chao
Předmět:
Zdroj: Soft Computing - A Fusion of Foundations, Methodologies & Applications; Oct2017, Vol. 21 Issue 20, p6143-6157, 15p
Abstrakt: Deriving frequent itemsets from databases is an important research issue in data mining. The number of frequent itemsets may be unusually large when a low minimum support threshold is given. As such, the design of a compact representation to compress and describe them is an interesting topic. In the past, most related research on compact representation focused on frequent closed itemsets and frequent maximal itemsets. The former is a lossless compact technology that can totally recover all frequent itemsets and their frequencies. Contrarily, the latter may lose some information regarding frequent itemsets, because it reserves frequent itemsets only and is unable to identify their frequency. In this paper, we propose a new compact representation that lies between closed itemsets and maximal itemsets. It can reserve all frequent itemsets and identify their approximate frequency. In addition, an efficient algorithm that corresponds to this new concept is designed to find related key information in databases. Finally, a series of experiments are conducted to show the effectiveness of compact representation and the performance of the proposed algorithm. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index