A Hybrid Modified Deep Learning Data Imputation Method for Numeric Datasets
Autor: | Cemalettin Kubat, Nuran Peker |
---|---|
Rok vydání: | 2021 |
Předmět: |
Mean squared error
Computer science business.industry Deep learning Value (computer science) Missing data computer.software_genre Computer Graphics and Computer-Aided Design Data type Random forest Missing values data imputation deep learning random forest Artificial Intelligence Control and Systems Engineering Data pre-processing Data mining Artificial intelligence Imputation (statistics) business computer Information Systems |
Zdroj: | International Journal of Intelligent Systems and Applications in Engineering; Vol. 9 No. 1 (2021); 6-11 |
ISSN: | 2147-6799 |
DOI: | 10.18201/ijisae.2021167931 |
Popis: | Missing data is a major problem in terms of both machine learning and data mining methods. Like most of these methods do not work with missing data, negative results may occur on the performance of the working ones, also. Imputation is a data preprocessing method used to replace missing data with appropriate values. This study aims at developing a hybrid modified imputation method based on deep learning approach. For this purpose, we use Random Forest and Datawig deep learning imputation (called RF-DLI) methods together. Datawig is a deep learning-based library that supports missing value imputation for all types of data. RF-DLI approach includes the following steps to impute missing data. First, the importance of each attribute of the dataset is determined with Random Forest (RF). Second, the most important 50% of the attributes are selected. Finally, each missing value is imputed with datawig (DLI) using these most important attributes. The study uses six real-world datasets from different fields with 30% missing data. The imputation performance of RF-DLI is compared to KNN, MICE, and MEAN imputation approaches in terms of MAE, RMSE, and R2 evaluation metrics. The results show that in most cases, the RF-DLI approach has better imputation performance than the other techniques mentioned. |
Databáze: | OpenAIRE |
Externí odkaz: |