Reducing cryptic relatedness in genomic data sets via a central node exclusion algorithm.
Autor: | Fonseca PAS; Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil., Leal TP; Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil., Santos FC; Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil., Gouveia MH; Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil., Id-Lahoucine S; Center for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada., Rosse IC; Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil., Ventura RV; Center for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada.; Beef Improvement Opportunities, Guelph, ON, Canada., Bruneli FAT; Embrapa Dairy Cattle, Juiz de Fora, MG, Brazil., Machado MA; Embrapa Dairy Cattle, Juiz de Fora, MG, Brazil., Peixoto MGCD; Embrapa Dairy Cattle, Juiz de Fora, MG, Brazil., Tarazona-Santos E; Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil., Carvalho MRS; Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil. |
---|---|
Jazyk: | angličtina |
Zdroj: | Molecular ecology resources [Mol Ecol Resour] 2018 May; Vol. 18 (3), pp. 435-447. Date of Electronic Publication: 2018 Jan 25. |
DOI: | 10.1111/1755-0998.12746 |
Abstrakt: | Cryptic relatedness is a confounding factor in genetic diversity and genetic association studies. Development of strategies to reduce cryptic relatedness in a sample is a crucial step for downstream genetic analyses. This study uses a node selection algorithm, based on network degrees of centrality, to evaluate its applicability and impact on evaluation of genetic diversity and population stratification. 1,036 Guzerá (Bos indicus) females were genotyped using Illumina Bovine SNP50 v2 BeadChip. Four strategies were compared. The first and second strategies consist on a iterative exclusion of most related individuals based on PLINK kinship coefficient (φij) and VanRaden's φij, respectively. The third and fourth strategies were based on a node selection algorithm. The fourth strategy, Network G matrix, preserved the larger number of individuals with a better diversity and representation from the initial sample. Determining the most probable number of populations was directly affected by the kinship metric. Network G matrix was the better strategy for reducing relatedness due to producing a larger sample, with more distant individuals, a more similar distribution when compared with the full data set in the MDS plots and keeping a better representation of the population structure. Resampling strategies using VanRaden's φij as a relationship metric was better to infer the relationships among individuals. Moreover, the resampling strategies directly impact the genomic inflation values in genomewide association studies. The use of the node selection algorithm also implies better selection of the most central individuals to be removed, providing a more representative sample. (© 2017 John Wiley & Sons Ltd.) |
Databáze: | MEDLINE |
Externí odkaz: |