Insight into neutral and disease-associated human genetic variants through interpretable predictors
Autor: | Tjaart A. P. de Beer, Dick de Ridder, Bastiaan A. van den Berg, Marcel J. T. Reinders |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2015 |
Předmět: |
Nonsynonymous substitution
Bioinformatics Molecular Sequence Data lcsh:Medicine Sequence alignment Computational biology Biology Polymorphism Single Nucleotide Protein sequencing Genetic variation Bioinformatica Feature (machine learning) Humans Life Science Amino Acid Sequence lcsh:Science Peptide sequence Sequence (medicine) Genetics Multidisciplinary Sequence Homology Amino Acid lcsh:R Genetic Diseases Inborn Genetic Variation Support vector machine OA-Fund TU Delft lcsh:Q Research Article |
Zdroj: | PLoS ONE, Vol 10, Iss 3, p e0120729 (2015) PLoS ONE, 10(3) PLoS ONE 10 (2015) 3 PloS One, 10 (3), 2015 PLoS ONE |
ISSN: | 1932-6203 |
Popis: | A variety of methods that predict human nonsynonymous single nucleotide polymorphisms (SNPs) to be neutral or disease-associated have been developed over the last decade. These methods are used for pinpointing disease-associated variants in the many variants obtained with next-generation sequencing technologies. The high performances of current sequence-based predictors indicate that sequence data contains valuable information about a variant being neutral or disease-associated. However, most predictors do not readily disclose this information, and so it remains unclear what sequence properties are most important. Here, we show how we can obtain insight into sequence characteristics of variants and their surroundings by interpreting predictors. We used an extensive range of features derived from the variant itself, its surrounding sequence, sequence conservation, and sequence annotation, and employed linear support vector machine classifiers to enable extracting feature importance from trained predictors. Our approach is useful for providing additional information about what features are most important for the predictions made. Furthermore, for large sets of known variants, it can provide insight into the mechanisms responsible for variants being disease-associated. |
Databáze: | OpenAIRE |
Externí odkaz: |