Evaluation of an associative classifier based on position-constrained frequent/closed subtree mining
Autor: | Michael Hecker, Andrea Tagarelli, Fedja Hadzic, Dang Bach Bui |
---|---|
Rok vydání: | 2014 |
Předmět: |
Association rule learning
Computer Networks and Communications Computer science business.industry Rate reduction Tree mining Pattern recognition Structural classification computer.software_genre Tree (data structure) Artificial Intelligence Hardware and Architecture Associative classifier Artificial intelligence Data mining business Classifier (UML) computer Software Associative property Information Systems |
Zdroj: | Journal of Intelligent Information Systems. 45:397-421 |
ISSN: | 1573-7675 0925-9902 |
DOI: | 10.1007/s10844-014-0312-9 |
Popis: | Tree-structured data are popular in many domains making structural classification an important task. In this paper, an associative classification method is introduced based on a structure preserving flat representation of trees. A major difference to traditional tree mining techniques is that subtrees are constrained by the position in the original trees, leading to a drastic reduction in the number of rules generated, especially with data having great structural variation among tree instances. This characteristic would be desired in the current status of frequent pattern mining, where excessive patterns hinder the practical use of results. However the question remains whether this reduction comes at a high cost in accuracy and coverage rate reduction. We explore this aspect and compare the approach with a state-of-the-art structural classifier based on same subtree type, but not positional constrained in any way. We investigate the effect of using different types of frequent pattern (frequent or closed), or subtree types (induced, embedded or embedded-plus-disconnected subtrees) to the performance of the two classifiers. Different rule strength measures such as confidence, weighted confidence and likelihood are also examined in our study. The experiments on three real-world data sets reveal important similarities and differences between the methods. |
Databáze: | OpenAIRE |
Externí odkaz: |