WEVar: a novel statistical learning framework for predicting noncoding regulatory variants.
Autor: | Wang Y; Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, 46202, USA., Jiang Y; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA., Yao B; Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, 36849, USA., Huang K; Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599, USA., Liu Y; Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA., Wang Y; Department of Human Genetics, Emory University, Atlanta, GA 30322, USA., Qin X; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA., Saykin AJ; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA., Chen L; Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, 46202, USA. |
---|---|
Jazyk: | angličtina |
Zdroj: | Briefings in bioinformatics [Brief Bioinform] 2021 Nov 05; Vol. 22 (6). |
DOI: | 10.1093/bib/bbab189 |
Abstrakt: | Understanding the functional consequence of noncoding variants is of great interest. Though genome-wide association studies or quantitative trait locus analyses have identified variants associated with traits or molecular phenotypes, most of them are located in the noncoding regions, making the identification of causal variants a particular challenge. Existing computational approaches developed for prioritizing noncoding variants produce inconsistent and even conflicting results. To address these challenges, we propose a novel statistical learning framework, which directly integrates the precomputed functional scores from representative scoring methods. It will maximize the usage of integrated methods by automatically learning the relative contribution of each method and produce an ensemble score as the final prediction. The framework consists of two modes. The first 'context-free' mode is trained using curated causal regulatory variants from a wide range of context and is applicable to predict regulatory variants of unknown and diverse context. The second 'context-dependent' mode further improves the prediction when the training and testing variants are from the same context. By evaluating the framework via both simulation and empirical studies, we demonstrate that it outperforms integrated scoring methods and the ensemble score successfully prioritizes experimentally validated regulatory variants in multiple risk loci. (© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.) |
Databáze: | MEDLINE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |