Imputation of Discrete and Continuous Missing Values in Large Datasets Using Bayesian Based Ant Colony Optimization
Autor: | Rajarajan Sivaraj, R. Devi Priya |
---|---|
Rok vydání: | 2016 |
Předmět: |
Engineering
Multidisciplinary business.industry Ant colony optimization algorithms Bayesian probability Probabilistic logic 02 engineering and technology computer.software_genre Missing data 01 natural sciences 010104 statistics & probability Data quality Covariate 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Data mining Imputation (statistics) 0101 mathematics business computer Statistical hypothesis testing |
Zdroj: | Arabian Journal for Science and Engineering. 41:4981-4993 |
ISSN: | 2191-4281 1319-8025 |
Popis: | When preparing large databases, obtaining quality data for analysis without any missing values is almost impossible in many cases. Integration of raw data from multiple heterogeneous sources often results in some values missing leading to loss of valuable information. Even though many methods have been introduced by researchers, only less effort has been spent on handling missing values in heterogeneous attributes (both discrete and continuous) under Missing At Random pattern, the common scenario where missing values have dependency on covariates in the dataset. Also, only few techniques are capable of dealing with missing values in large databases and this demands immediate attention of researchers. This paper addresses both these problems by introducing a single technique called Bayesian Ant colony Optimization (BACO) which combines the searching capability of Ant Colony Optimization with probabilistic nature of Bayesian principles. The algorithm is designed in such a way that missing values in both discrete and continuous attributes in large datasets are efficiently imputed. BACO is implemented in six large real datasets, and it is observed that its imputation accuracy outperforms than that of existing standard techniques. The statistical tests conducted also prove the superior results of BACO in the imputation process. |
Databáze: | OpenAIRE |
Externí odkaz: |