Data clustering and imputing using a two-level multi-objective genetic algorithm (GA): A case study of maintenance cost data for tunnel fans
Autor: | Yamur K. Aldouri, Hassan Al-Chalabi, Liangwei Zhang |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: | |
Zdroj: | Cogent Engineering, Vol 5, Iss 1 (2018) |
Druh dokumentu: | article |
ISSN: | 2331-1916 23311916 |
DOI: | 10.1080/23311916.2018.1513304 |
Popis: | This study develops a new two-level multi-objective genetic algorithm (GA) to optimise clustering to reduce and impute missing cost data for fans used in road tunnels by the Swedish Transport Administration (Trafikverket). Level 1 uses a multi-objective GA based on fuzzy c-means to cluster cost data objects based on three main indices. The first is cluster centre outliers; the second is the compactness and separation (vk ) of the data points and cluster centres; the third is the intensity of data points belonging to the derived clusters. Our clustering model is validated using k-means clustering. Level 2 uses a multi-objective GA to impute the reduced missing cost data in volumeusing a valid data period. The optimal population has a low vk , 0.1%, and a high intensity, 99%. It has three cluster centres, and the highest data reduction is 27%. These three cluster centres have a suitable geometry, so the cost data can be partitioned into relevant contents to be redacted for imputing. Our model shows better clustering detection and evaluation than models using k-means. The percentage of missing data for the two cost objects is the following: labour 57%, materials 81%. The second level shows highly correlated data (R-squared 0.99) after imputing. Therefore, the study concludes multi-objective GA can cluster and impute data to derive complete data for forecasting. |
Databáze: | Directory of Open Access Journals |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |