Efficiently Maintaining the Fast Updated Sequential Pattern Trees With Sequence Deletion

Autor: Tzung-Pei Hong, Wensheng Gan, Jerry Chun-Wei Lin
Jazyk: angličtina
Rok vydání: 2014
Předmět:
Zdroj: IEEE Access, Vol 2, Pp 1374-1383 (2014)
ISSN: 2169-3536
Popis: Among the discovered knowledge, sequential-pattern mining is used to discover the frequent subsequences from a sequence database. Most research handles the static database in batch mode to discover the desired sequential patterns. In the past, the fast updated (FUP) and Fast UPdated 2 (FUP2) concepts were adopted to, respectively, maintain and update the discovered sequential patterns with sequence insertion and sequence deletion based on the designed FUP sequential pattern (FUSP)-tree structure. Based on the FUP or FUP2 concepts, original customer sequences are required to be rescanned if it is necessary to maintain and update the unpromising (small) sequences from the original database. In the past, pre-large concept was designed to keep the prelarge itemsets as the buffer to avoid the database rescan each time whether transaction insertion or deletion in the dynamic databases. In this paper, the prelarge concept is adopted to handle the discovered sequential patterns with sequence deletion. An FUSP tree is first built to keep only the frequent 1-sequences from the original database. The prelarge 1-sequences are also kept in a set for later maintenance approach. When some sequences are deleted from the original database, the proposed algorithm is then performed to divide the kept frequent 1-sequences and prelarge 1-sequences from the original database and the mined 1-sequences from the deleted customer sequences into three parts with nine cases. Each case is then processed by the designed algorithm to maintain and update the built FUSP tree. When the number of deleted customer sequences is smaller than the safety bound of the prelarge concept, the original customer sequences are unnecessary to be rescanned, but the sequential patterns can still be actually maintained and updated. Experiments are conducted to show the performance of the proposed algorithm in terms of execution time and the number of tree nodes.
Databáze: OpenAIRE