Imputation of Discrete and Continuous Missing Values in Large Datasets Using Bayesian Based Ant Colony Optimization

Autor: Rajarajan Sivaraj, R. Devi Priya
Rok vydání: 2016
Předmět:
Zdroj: Arabian Journal for Science and Engineering. 41:4981-4993
ISSN: 2191-4281
1319-8025
Popis: When preparing large databases, obtaining quality data for analysis without any missing values is almost impossible in many cases. Integration of raw data from multiple heterogeneous sources often results in some values missing leading to loss of valuable information. Even though many methods have been introduced by researchers, only less effort has been spent on handling missing values in heterogeneous attributes (both discrete and continuous) under Missing At Random pattern, the common scenario where missing values have dependency on covariates in the dataset. Also, only few techniques are capable of dealing with missing values in large databases and this demands immediate attention of researchers. This paper addresses both these problems by introducing a single technique called Bayesian Ant colony Optimization (BACO) which combines the searching capability of Ant Colony Optimization with probabilistic nature of Bayesian principles. The algorithm is designed in such a way that missing values in both discrete and continuous attributes in large datasets are efficiently imputed. BACO is implemented in six large real datasets, and it is observed that its imputation accuracy outperforms than that of existing standard techniques. The statistical tests conducted also prove the superior results of BACO in the imputation process.
Databáze: OpenAIRE