Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases

Autor:	Meia Alsup, Purvesh Khatri, Mark M. Davis, Francesco Vallania, Edgar G. Engleman, Andrew Tam, Erika Bongen, Steven Schaffert, Winston A. Haynes, Tej D. Azad, Shane Lofgren, Michael N. Alonso
Jazyk:	angličtina
Rok vydání:	2018
Předmět:	0301 basic medicine Computer science Science General Physics and Astronomy Article General Biochemistry Genetics and Molecular Biology 03 medical and health sciences Matrix (mathematics) Humans Disease lcsh:Science Microarray platform Multidisciplinary Basis (linear algebra) business.industry Pattern recognition General Chemistry Microarray Analysis Expression (mathematics) 030104 developmental biology Minimal effect Databases as Topic ROC Curve Leukocytes Mononuclear lcsh:Q Artificial intelligence Deconvolution business
Zdroj:	Nature Communications, Vol 9, Iss 1, Pp 1-8 (2018) Nature Communications
ISSN:	2041-1723
Popis:	In silico quantification of cell proportions from mixed-cell transcriptomics data (deconvolution) requires a reference expression matrix, called basis matrix. We hypothesize that matrices created using only healthy samples from a single microarray platform would introduce biological and technical biases in deconvolution. We show presence of such biases in two existing matrices, IRIS and LM22, irrespective of deconvolution method. Here, we present immunoStates, a basis matrix built using 6160 samples with different disease states across 42 microarray platforms. We find that immunoStates significantly reduces biological and technical biases. Importantly, we find that different methods have virtually no or minimal effect once the basis matrix is chosen. We further show that cellular proportion estimates using immunoStates are consistently more correlated with measured proportions than IRIS and LM22, across all methods. Our results demonstrate the need and importance of incorporating biological and technical heterogeneity in a basis matrix for achieving consistently high accuracy. Cell type deconvolution from bulk expression data rely on a reference expression matrix. Here, the authors introduce a basis matrix built using data from both healthy and diseased samples profiled on 42 platforms, reducing biases introduced by single-platform matrices built using healthy samples.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3ae166c327642bdc5640ffaaf3b3af23 https://doaj.org/article/6f12c79a854b4734a673942857dc8a8a Zobrazit plný text záznamu