Validation of genotype imputation in Southeast Asian populations and the effect of single nucleotide polymorphism annotation on imputation outcome

Autor: Nattaya Tangthawornchaikul, Worachart Lert-itthiporn, Anavaj Sakuntabhai, Prida Malasit, Harald Grove, Fumihiko Matsuda, Prapat Suriyaphol, Bhoom Suktitipat
Přispěvatelé: Mahidol University [Bangkok], Génétique fonctionnelle des Maladies infectieuses - Functional Genetics of Infectious Diseases, Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS), Kyoto University [Kyoto], This research was supported by National Research Council of Thailand (NRCT) (grant number 92/2550). Additionally, the research was partial funded by the Office of the Higher Education Commission and Mahidol University under the National Research Universities Initiative. W. Lert-itthiporn was supported by a scholarship from the Medical Scholars Program, Mahidol University. P. Malasit was supported by NSTDA Chair Professor Grant. P. Suriyaphol was supported by a National Research University (NRU) Grant through Mahidol University and TRF-office for R&D., Centre National de la Recherche Scientifique (CNRS)-Institut Pasteur [Paris], Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS), Kyoto University
Jazyk: angličtina
Rok vydání: 2018
Předmět:
0301 basic medicine
Pan-Asian SNP
Linkage Disequilibrium
MESH: Genotype
Gene Frequency
MESH: Child
Statistics
International HapMap Project
Child
Genetics (clinical)
MESH: Genetic Association Studies
MESH: Asian Continental Ancestry Group
MESH: Polymorphism
Single Nucleotide

MESH: Infant
[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
MESH: Reproducibility of Results
MESH: Linkage Disequilibrium
SNP annotation
Child
Preschool

[SDV.MP.VIR]Life Sciences [q-bio]/Microbiology and Parasitology/Virology
Research Article
lcsh:Internal medicine
dbSNP
Reference
Adolescent
Genotype
lcsh:QH426-470
MESH: Genetics
Population

Single-nucleotide polymorphism
MESH: Molecular Sequence Annotation
Biology
Southeast asian
Polymorphism
Single Nucleotide

03 medical and health sciences
Asian People
MESH: Gene Frequency
Genetics
Humans
[SDV.MP.PAR]Life Sciences [q-bio]/Microbiology and Parasitology/Parasitology
1000 Genomes Project
lcsh:RC31-1245
Genetic Association Studies
MESH: Genome
Human

Genetic association
Imputation
MESH: Adolescent
MESH: Humans
[SDV.GEN.GPO]Life Sciences [q-bio]/Genetics/Populations and Evolution [q-bio.PE]
Genome
Human

MESH: Child
Preschool

Infant
Reproducibility of Results
Molecular Sequence Annotation
MESH: Haplotypes
lcsh:Genetics
Genetics
Population

030104 developmental biology
Haplotypes
[SDV.GEN.GH]Life Sciences [q-bio]/Genetics/Human genetics
[SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
Imputation (genetics)
Zdroj: BMC Medical Genetics, Vol 19, Iss 1, Pp 1-10 (2018)
BMC Medical Genetics
BMC Medical Genetics, BioMed Central, 2018, 19 (1), pp.23. ⟨10.1186/s12881-018-0534-8⟩
BMC Medical Genetics, 2018, 19 (1), pp.23. ⟨10.1186/s12881-018-0534-8⟩
ISSN: 1471-2350
DOI: 10.1186/s12881-018-0534-8
Popis: Background Imputation involves the inference of untyped single nucleotide polymorphisms (SNPs) in genome-wide association studies. The haplotypic reference of choice for imputation in Southeast Asian populations is unclear. Moreover, the influence of SNP annotation on imputation results has not been examined. Methods This study was divided into two parts. In the first part, we applied imputation to genotyped SNPs from Southeast Asian populations from the Pan-Asian SNP database. Five percent of the total SNPs were removed. The remaining SNPs were applied to imputation with IMPUTE2. The imputed outcomes were verified with the removed SNPs. We compared imputation references from Chinese and Japanese haplotypes from the HapMap phase II (HMII) and the complete set of haplotypes from the 1000 Genomes Project (1000G). The second part was imputation accuracy and yield in Thai patient dataset. Half of the autosomal SNPs was removed to create Set 1. Another dataset, Set 2, was then created where we switched which half of the SNPs were removed. Both Set 1 and Set 2 were imputed with HMII to create a complete imputed SNPs dataset. The dataset was used to validate association testing, SNPs annotation and imputation outcome. Results The accuracy was highest for all populations when using the HMII reference, but at the cost of a lower yield. Thai genotypes showed the highest accuracy over other populations in both HMII and 1000G panels, although accuracy and yield varied across chromosomes. Imputation was tested in a clinical dataset to compare accuracy in gene-related regions, and coding regions were found to have a higher accuracy and yield. Conclusions This work provides the first evidence of imputation reference selection for Southeast Asian studies and highlights the effects of SNP locations respective to genes on imputation outcome. Researchers will need to consider the trade-off between accuracy and yield in future imputation studies. Electronic supplementary material The online version of this article (10.1186/s12881-018-0534-8) contains supplementary material, which is available to authorized users.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje