Improving Association Rule Mining Using Clustering-based Discretization of Numerical Data

Autor: Swee Chuan Tan
Rok vydání: 2018
Zdroj: 2018 International Conference on Intelligent and Innovative Computing Applications (ICONIC).
Popis: Association rule mining is an important data mining technique that help discover interesting attribute relationships that are useful for decision making. Most association rule mining methods use item-set manipulation approach, whereby data type must be categorical in nature. When a dataset contains numerical attributes, they will need to be discretized before rule mining. At the moment, most unsupervised data discretization methods do not account for data distributions, and users have to try different methods and discretization settings in order to improve rule mining results. In this paper, we propose using TwoStep clustering for data discretization. Unlike simple discretization methods, TwoStep automatically determines the discretization intervals by taking into account the unique data distribution property of each attribute. In our experiments, we evaluated the performance of Apriori algorithm based on four datasets, whereby each dataset was pre-processed using TwoStep and three other commonly used discretization methods. Our results show that TwoStep produced the greatest number of high-quality rules, as compared to common discretization methods.
Databáze: OpenAIRE