A Bayesian Mixed Regression Based Prediction of Quantitative Traits from Molecular Marker and Gene Expression Data
Autor: | Mikko J. Sillanpää, Madhuchhanda Bhattacharjee |
---|---|
Přispěvatelé: | Department of Mathematics and Statistics, Department of Agricultural Sciences |
Jazyk: | angličtina |
Rok vydání: | 2011 |
Předmět: |
Microarrays
119 Other natural sciences education MODELS Bayesian probability lcsh:Medicine Feature selection LASSO Computational biology Biology Biostatistics 01 natural sciences Models Biological Cross-validation 010104 statistics & probability 03 medical and health sciences Bayes' theorem Quantitative Trait Heritable Lasso (statistics) Covariate Genetics Bayesian hierarchical modeling GENOME-WIDE ASSOCIATION CROSS-VALIDATION 0101 mathematics Statistical Methods lcsh:Science 030304 developmental biology Plant Diseases 0303 health sciences Multidisciplinary Gene Expression Profiling COMPONENTS lcsh:R Multilevel model Statistics 1184 Genetics developmental biology physiology Computational Biology Bayes Theorem Genomics VARIABLE SELECTION Phenotype lcsh:Q Soybeans Mathematics Biomarkers Research Article |
Zdroj: | PLoS ONE PLoS ONE, Vol 6, Iss 11, p e26959 (2011) |
ISSN: | 1932-6203 |
Popis: | Both molecular marker and gene expression data were considered alone as well as jointly to serve as additive predictors for two pathogen-activity-phenotypes in real recombinant inbred lines of soybean. For unobserved phenotype prediction, we used a bayesian hierarchical regression modeling, where the number of possible predictors in the model was controlled by different selection strategies tested. Our initial findings were submitted for DREAM5 (the 5th Dialogue on Reverse Engineering Assessment and Methods challenge) and were judged to be the best in sub-challenge B3 wherein both functional genomic and genetic data were used to predict the phenotypes. In this work we further improve upon this previous work by considering various predictor selection strategies and cross-validation was used to measure accuracy of in-data and out-data predictions. The results from various model choices indicate that for this data use of both data types (namely functional genomic and genetic) simultaneously improves out-data prediction accuracy. Adequate goodness-of-fit can be easily achieved with more complex models for both phenotypes, since the number of potential predictors is large and the sample size is not small. We also further studied gene-set enrichment (for continuous phenotype) in the biological process in question and chromosomal enrichment of the gene set. The methodological contribution of this paper is in exploration of variable selection techniques to alleviate the problem of over-fitting. Different strategies based on the nature of covariates were explored and all methods were implemented under the bayesian hierarchical modeling framework with indicator-based covariate selection. All the models based in careful variable selection procedure were found to produce significant results based on permutation test. |
Databáze: | OpenAIRE |
Externí odkaz: |