Cluster Buster: A Machine Learning Algorithm for Genotyping SNPs from Raw Data.

Autor: Martin J; Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA 20892., Kuznetsov N; Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA 20892.; DataTecnica LLC, Washington, DC, USA 20037., Levine K; Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA 20892.; DataTecnica LLC, Washington, DC, USA 20037., Koretsky MJ; Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA 20892.; DataTecnica LLC, Washington, DC, USA 20037., Hong S; Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA 20892., Nalls MA; Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA 20892.; DataTecnica LLC, Washington, DC, USA 20037., Vitale D; Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA 20892.; DataTecnica LLC, Washington, DC, USA 20037.
Jazyk: angličtina
Zdroj: BioRxiv : the preprint server for biology [bioRxiv] 2024 Aug 26. Date of Electronic Publication: 2024 Aug 26.
DOI: 10.1101/2024.08.23.609429
Abstrakt: Genotyping single nucleotide polymorphisms (SNPs) is fundamental to disease research, as researchers seek to establish links between genetic variation and disease. Although significant advances in genome technology have been made with the development of bead-based SNP genotyping and Genome Studio software, some SNPs still fail to be genotyped, resulting in "no-calls" that impede downstream analyses. To recover these genotypes, we introduce Cluster Buster, a genotyping neural network and visual inspection system designed to improve the quality of neurodegenerative disease (NDD) research. Concordance analysis with whole genome sequencing (WGS) and imputed genotypes validated the reliability of predicted genotypes, with dozens of high-performing SNPs across LRRK2 , APOE , and GBA loci achieving at least 90% concordance per SNP location. Further analysis of concordance between Genome Studio genotypes and imputed and WGS genotypes revealed discrepancies between the genotyping technologies, highlighting the need for selective application of Cluster Buster on SNP locations based on concordance rates. Cluster Buster's implementation significantly reduces manual labor for recovering no-call SNPs, refining genotype quality for the Global Parkinson's Genetics Program (GP2). This system facilitates better imputation and GWAS outcomes, ultimately contributing to a deeper understanding of genetic factors in NDDs.
Competing Interests: KL, DV, MAN, and MJK participated in this project under a competitive contract awarded to Data Tecnica International LLC by the National Institutes of Health to support open-access scientific research. MAN also owns stock from Character Bio Inc and Neuron23 Inc.
Databáze: MEDLINE