Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques
Autor: | Zahidul Islam, Md. Geaur Rahman |
---|---|
Rok vydání: | 2013 |
Předmět: |
Data cleansing
Information Systems and Management Computer science Decision tree learning Decision tree computer.software_genre Missing data Management Information Systems Data set Artificial Intelligence Statistics Data pre-processing Imputation (statistics) Data mining Categorical variable computer Software |
Zdroj: | Knowledge-Based Systems. 53:51-65 |
ISSN: | 0950-7051 |
Popis: | We present two novel techniques for the imputation of both categorical and numerical missing values. The techniques use decision trees and forests to identify horizontal segments of a data set where the records belonging to a segment have higher similarity and attribute correlations. Using the similarity and correlations, missing values are then imputed. To achieve a higher quality of imputation some segments are merged together using a novel approach. We use nine publicly available data sets to experimentally compare our techniques with a few existing ones in terms of four commonly used evaluation criteria. The experimental results indicate a clear superiority of our techniques based on statistical analyses such as confidence interval. |
Databáze: | OpenAIRE |
Externí odkaz: |