Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi

Autor:	Kimberly T. To, Rebecca C. Fry, David M. Reif
Jazyk:	angličtina
Rok vydání:	2018
Předmět:	Chemical prioritization ToxPi ToxCast Missing data Imputation Multiple imputation Computer applications to medicine. Medical informatics R858-859.7 Analysis QA299.6-433
Zdroj:	BioData Mining, Vol 11, Iss 1, Pp 1-12 (2018)
Druh dokumentu:	article
ISSN:	1756-0381
DOI:	10.1186/s13040-018-0169-5
Popis:	Abstract Background The Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources. However, individual data sources (“assays”), such as in vitro bioassays or in vivo study endpoints, often feature sections of missing data, wherein subsets of chemicals have not been tested in all assays. In order to investigate the effects of missing data and recommend solutions, we designed simulation studies around high-throughput screening data generated by the ToxCast and Tox21 programs on chemicals highlighted by the Agency for Toxic Substances and Disease Registry’s (ATSDR) Substance Priority List (SPL), which helps prioritize environmental research and remediation resources. Results Our simulations explored a wide range of scenarios concerning data (0-80% assay data missing per chemical), modeling (ToxPi models containing from 160-700 different assays), and imputation method (k-Nearest-Neighbor, Max, Mean, Min, Binomial, Local Least Squares, and Singular Value Decomposition). We find that most imputation methods result in significant changes to ToxPi score, except for datasets with a small number of assays. If we consider rank change conditional on these significant changes to ToxPi score, we find that ranks of chemicals in the minimum value imputation, SVD imputation, and kNN imputation sets are more sensitive to the score changes. Conclusions We found that the choice of imputation strategy exerted significant influence over both scores and associated ranks, and the most sensitive scenarios were those involving fewer assays plus higher proportions of missing data. By characterizing the effects of missing data and the relative benefit of imputation approaches across real-world data scenarios, we can augment confidence in the robustness of decisions regarding the health and ecological effects of environmental chemicals
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/b4f32bca53974d15b004d1996bb40f3d Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.