Autor: |
Solomon, Norman, Oatley, Giles, McGarry, Kenneth |
Jazyk: |
angličtina |
Rok vydání: |
2007 |
Předmět: |
|
ISSN: |
2078-0958 |
Popis: |
Imputation of missing data is important in many\ud areas, such as reducing non-response bias in surveys and\ud maintaining medical documentation. Nearest neighbour (NN)\ud imputation algorithms replace the missing values within any\ud particular observation by taking copies of the corresponding\ud known values from the most similar observation found in the\ud dataset. However, when NN algorithms are executed against large\ud multivariate datasets the poor\ud performance (program execution\ud speed) of these algorithms can present major practical problems.\ud We argue that these problems\ud have not been sufficiently\ud addressed, and we present a fast NN imputation algorithm that\ud can employ any method for meas\ud uring the similarity between\ud observations. The algorithm has b\ud een designed for the imputation\ud of missing values in large multivar\ud iate datasets that contain many\ud different missingness patterns with large proportions of missing\ud data. The ideas underpinning th\ud e algorithm are explained in\ud detail, and experiments are described which show that the\ud algorithm delivers very good perf\ud ormance when it is used for\ud imputation in both segmented and non-segmented datasets\ud containing several million rows |
Databáze: |
OpenAIRE |
Externí odkaz: |
|