Missing data processing based on attribute values partitioning and ant colony clustering
Autor: | Chin-hung Li, 李進鴻 |
---|---|
Rok vydání: | 2009 |
Druh dokumentu: | 學位論文 ; thesis |
Popis: | 97 Data mining is an important technique to extract useful knowledge from a set of raw data. The managers can exploit the mining knowledge to make right decisions. However, missing data significantly distort data mining results. Therefore, data preprocessing of missing values becomes extremely critical in successful data mining. Data clustering techniques is the partitioning of a dataset into clusters so that the data records in each cluster possess common characteristics. The shared characteristics can be utilized to predict the missing values. In this study, we propose an attribute values partitioning technique to preserve the relationships between attributes for estimating missing values. On the other hand, ant colony optimization (ACO) algorithm was recently proposed by few researchers to solve data clustering problems. In this study, we propose an improved ACO clustering approach, and employ the ant clustering as a basis to estimate the missing data. Furthermore, we integrate the attribute values partitioning with the ant clustering techniques to improve the estimation performance. Effectiveness of the proposed approaches is demonstrated on four datasets for four different rates of missing data. The empirical evaluation shows the improved ant clustering algorithm outperforms the previous methods in clustering quality, and the integrated missing data processing approach provides competitive results or performs well compared with the existing methods. |
Databáze: | Networked Digital Library of Theses & Dissertations |
Externí odkaz: |