A compilation of tri-allelic SNPs from 1000 Genomes and use of the most polymorphic loci for a large-scale human identification panel
Autor: | Michelle A. Peck, M. de la Puente, Christopher Phillips, Thomas J. Parsons, Jorge Ruiz-Ramírez, F. Bittner, Y. Wang, Maria Victoria Lareu, S. Idrizbegovic, Jorge Amigo, Andreas O. Tillmar |
---|---|
Rok vydání: | 2019 |
Předmět: |
0301 basic medicine
Forensic Genetics Heterozygote Genotype Population Datasets as Topic Single-nucleotide polymorphism Biology Genome Polymorphism Single Nucleotide Pathology and Forensic Medicine Structural variation 03 medical and health sciences 0302 clinical medicine Gene Frequency Genetics Humans 030216 legal & forensic medicine 1000 Genomes Project Genetik education Allele frequency Genotyping Alleles education.field_of_study Genome Human High-Throughput Nucleotide Sequencing Pedigree 030104 developmental biology Genetics Population Tri-allelic SNPs 1000 Genomes Missing persons identification Massively parallel sequencing |
Zdroj: | Forensic science international. Genetics. 46 |
ISSN: | 1878-0326 |
Popis: | In a directed search of 1000 Genomes Phase III variation data, 271,934 tri-allelic single nucleotide polymorphisms (SNPs) were identified amongst the genotypes of 2,504 individuals from 26 populations. The majority of tri-allelic SNPs have three nucleotide substitution-based alleles at the same position, while a much smaller proportion, which we did not compile, have a nucleotide insertion/deletion plus substitution alleles. SNPs with three alleles have higher discrimination power than binary loci but keep the same characteristic of optimum amplification of the fragmented DNA found in highly degraded forensic samples. Although most of the tri-allelic SNPs identified had one or two alleles at low frequencies, often single observations, we present a full compilation of the genome positions, rs-numbers and genotypes of all tri-allelic SNPs detected by the 1000 Genomes project from the more detailed analyses it applied to Phase III sequence data. A total of 8,705 tri-allelic SNPs had overall heterozygosities (averaged across all 1000 Genomes populations) higher than the binary SNP maximum value of 0.5. Of these, 1,637 displayed the highest average heterozygosity values of 0.6-0.666. The most informative tri-allelic SNPs we identified were used to construct a large-scale human identification panel for massively parallel sequencing, designed for the identification of missing persons. The large-scale MPS identification panel comprised: 1,241 autosomal tri-allelic SNPs and 29 X tri-allelic SNPs (plus 46 microhaplotypes adapted for genotyping from reduced length sequences). Allele frequency estimates are detailed for African, European, South Asian and East Asian population groups plus the Peruvian population sampled by 1000 Genomes for the 1,270 tri-allelic SNPs of the final MPS panel. We describe the selection criteria, kinship simulation experiments and genomic analyses used to select the tri-allelic SNP components of the panel. Approximately 5 % of the tri-allelic SNPs selected for the large-scale MPS identification panel gave three-genotype patterns in single individual samples or discordant genotypes for genomic control DNAs. A likely explanation for some of these unreliably genotyped loci is that they map to multiple sites in the genome - high-lighting the need for caution and detailed scrutiny of multiple-allele variant data when designing future forensic SNP panels, as such patterns can arise from common structural variation in the genome, such as segmental duplications. Funding Agencies|MAPA, Multiple Allele Polymorphism Analysis [BIO2016-78525-R]; Spanish Research State Agency (AEI); ERDF funds; Conselleria de Cultura, Educacion e Ordenacion Universitaria; Conselleria de Economia, Emprego e Industria of the Xunta de Galicia [ED481B 2017/088] |
Databáze: | OpenAIRE |
Externí odkaz: |