Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies.
Autor: | Tamba CL; State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China.; Department of Mathematics, Egerton University, Egerton, Kenya., Ni YL; State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China., Zhang YM; State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China.; Statistical Genomics Lab, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China. |
---|---|
Jazyk: | angličtina |
Zdroj: | PLoS computational biology [PLoS Comput Biol] 2017 Jan 31; Vol. 13 (1), pp. e1005357. Date of Electronic Publication: 2017 Jan 31 (Print Publication: 2017). |
DOI: | 10.1371/journal.pcbi.1005357 |
Abstrakt: | Genome-wide association study (GWAS) entails examining a large number of single nucleotide polymorphisms (SNPs) in a limited sample with hundreds of individuals, implying a variable selection problem in the high dimensional dataset. Although many single-locus GWAS approaches under polygenic background and population structure controls have been widely used, some significant loci fail to be detected. In this study, we used an iterative modified-sure independence screening (ISIS) approach in reducing the number of SNPs to a moderate size. Expectation-Maximization (EM)-Bayesian least absolute shrinkage and selection operator (BLASSO) was used to estimate all the selected SNP effects for true quantitative trait nucleotide (QTN) detection. This method is referred to as ISIS EM-BLASSO algorithm. Monte Carlo simulation studies validated the new method, which has the highest empirical power in QTN detection and the highest accuracy in QTN effect estimation, and it is the fastest, as compared with efficient mixed-model association (EMMA), smoothly clipped absolute deviation (SCAD), fixed and random model circulating probability unification (FarmCPU), and multi-locus random-SNP-effect mixed linear model (mrMLM). To further demonstrate the new method, six flowering time traits in Arabidopsis thaliana were re-analyzed by four methods (New method, EMMA, FarmCPU, and mrMLM). As a result, the new method identified most previously reported genes. Therefore, the new method is a good alternative for multi-locus GWAS. Competing Interests: The authors have declared that no competing interests exist. |
Databáze: | MEDLINE |
Externí odkaz: |