A Study of Mining Interesting Patterns and Sequential Patterns over Static Databases and Data Streams Environments

Autor: Yi-Chun Chen, 陳奕鈞
Rok vydání: 2013
Druh dokumentu: 學位論文 ; thesis
Popis: 102
Over the past decade, there have been tremendous interests in analyzing huge amount through various data mining techniques. Many data mining techniques have been well-developed, such as association rules mining, sequential pattern mining and so on. The original problem on mining frequent itemsets focuses on identifying items that have occurred together. However, the number of the mined frequent patterns often exceeds the capacity of human’s mind. Or users might be interested in the other relations between items. Therefore, it is necessary for effectively present patterns according to their interestingness. In recent times, data in many applications are generated as a form of continuous data streams. Since handling data streams is necessary and discovering knowledge behind data streams can often yield substantial benefits, mining over data streams has become one of the most important issues. In this dissertation, we concentrate on the problems of mining interesting patterns and sequential patterns over static databases and data streams environments. First, we discuss the problem of mining substitution rules and proposes the NRM (non-redundant substitution rules mining) algorithm as a solution. A substitution rule is said to be non-redundant if the provided information is not covered by other rules. To make the mining process more efficient, the property of frequent closed patterns is utilized and three lemmas are proposed to prune redundant substitution rules. Our experimental results show that the performance of NRM algorithm is superior to that of naïve apriori-based approaches. Second, the idea of sequential association rule is introduced. The sequential association rule represents the concept that a set of items usually occur after a specific order sequence. Two algorithms, GSAR and PSAR algorithms, are proposed to discover these hidden knowledge. Moreover, experiments are performed on both synthetic and real datasets to show the benefit of our approach. Next, a new concept named knowledge stream is proposed. The proposed approach focuses on continuously differentiating interesting and valuable patterns from data stream, which is the issue for Interestingness-oriented Knowledge Discovery (IOKD). Several principles and interestingness measures are proposed for modeling and measuring the interestingness growth of knowledge in a data stream. A new data structure, Pattern’s Interestingness Tree (PI-Tree) is presented for discovering frequent patterns and helping to distinguish interesting knowledge. Performance Analysis indicates that the proposed approach is efficient for IOKD. Finally, we deal with the problem of sequential patterns mining over data streams. We assume that the transaction of a user is partially coming and that there is no auxiliary for buffering and integrating. We adopt the PTree for mining frequent sequential patterns over data streams and integrate the user’s sequences efficiently. Algorithms with regards to accuracy (PAlgorithm) and space (PSAlgorithm) are proposed to meet the different aspects of users. Moreover, GAlgorithm for mining frequent sequential patterns with a gap limitation is proposed. Many pruning properties are used to further reduce the space usage and improve the accuracy of our algorithms. We also prove that PAlgorithm mine frequent sequential patterns with the approximate support of error guarantee. Through thoughtful experiments, synthetic and real datasets are utilized to verify the feasibility of our algorithms.
Databáze: Networked Digital Library of Theses & Dissertations