High Occupancy Itemset Mining with Consideration of Transaction Occupancy

Autor: Kalyani Mali, Udit Ghosh, Subrata Datta
Rok vydání: 2021
Předmět:
Zdroj: Arabian Journal for Science and Engineering. 47:2061-2075
ISSN: 2191-4281
2193-567X
Popis: Discovering high occupancy itemsets is an interesting area of research in data mining. Occupancy computation in traditional approaches is restricted to the occupied portions of the itemsets in the supporting transactions only. It can’t distinguish between the occupancies of the same itemset in different supporting transactions of equal lengths. If itemset size is equal to the transaction length, occupancy becomes highest. The fact promotes the generation of undesirable itemsets especially the isolated ones. Furthermore, average occupancies of the itemsets having equal size become equal though they appear in different transactions of equal lengths. To address the above issues, this paper introduces the concept of transaction occupancy (TO) and thereafter presents a new computational model of itemset occupancy (IO) in account of transaction occupancy. Transaction occupancy refers to the occupied portion in the database by the transactions. This paper proposes an efficient list-structure-based algorithm called HOIMTO (high occupancy itemset mining with transaction occupancy) to discover the high occupancy itemsets (HOIs) from the transactional databases. A new itemset occupancy upper bound (IOUB) is also introduced in this paper to reduce the candidate search space. Experimental studies show the effectiveness of the proposed approach in terms of itemset generation, runtime, memory usages and scalability.
Databáze: OpenAIRE