Mining Statistically Significant Patterns with High Utility

Autor: Huijun Tang, Jiangbo Qian, Yangguang Liu, Xiao-Zhi Gao
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: International Journal of Computational Intelligence Systems, Vol 15, Iss 1, Pp 1-19 (2022)
Druh dokumentu: article
ISSN: 1875-6883
DOI: 10.1007/s44196-022-00149-7
Popis: Abstract Statistically significant pattern mining (SSPM) is to mine patterns with significance based on hypothesis test. Under the constraint of statistical significance, our study aims to introduce a new preference relation into high utility patterns and to discover high utility and significant patterns (HUSPs) from transaction datasets, which has never been considered in existing SSPM problems. Our approach can be divided into two parts, HUSP-Mining and HUSP-Test. HUSP-Mining looks for HUSP candidates and HUSP-Test tests their significance. HUSP-Mining is not outputting all high utility itemsets (HUIs) as HUSP candidates; it is established based on candidate length and testable support requirements which can remove many insignificant HUIs early in the mining process; compared with the traditional HUIs mining algorithm, it can get candidates in a short time without losing the real HUSPs. HUSP-Test is to draw significant patterns from the results of HUSP-Mining based on Fisher’s test. We propose an iterative multiple testing procedure, which can alternately and efficiently reject a hypothesis and safely ignore the hypotheses that have less utility than the rejected hypothesis. HUSP-Test controls Family-wise Error Rate (FWER) under a user-defined threshold by correcting the test level which can find more HUSPs than standard Bonferroni’s control. Substantial experiments on real datasets show that our algorithm can draw HUSPs efficiently from transaction datasets with strong mathematical guarantee.
Databáze: Directory of Open Access Journals