A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data.

Autor: Battle SL; McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA., Puiu D; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA., Verlouw J; Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands., Broer L; Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands., Boerwinkle E; Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA., Taylor KD; The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA., Rotter JI; The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA., Rich SS; Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA., Grove ML; Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA., Pankratz N; Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA., Fetterman JL; Evans Department of Medicine and the Whitaker Cardiovascular Institute, Boston University School of Medicine, Boston, MA, USA., Liu C; Framingham Heart Study, Boston University School of Medicine, Boston, MA, USA., Arking DE; McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
Jazyk: angličtina
Zdroj: NAR genomics and bioinformatics [NAR Genom Bioinform] 2022 May 17; Vol. 4 (2), pp. lqac034. Date of Electronic Publication: 2022 May 17 (Print Publication: 2022).
DOI: 10.1093/nargab/lqac034
Abstrakt: Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA (mtDNA) variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules have the same variant. The relative quantity of mtDNA in a cell, or copy number (mtDNA-CN), is associated with mitochondrial function, human disease, and mortality. To facilitate accurate identification of heteroplasmy and quantify mtDNA-CN, we built a bioinformatics pipeline that takes whole genome sequencing data and outputs mitochondrial variants, and mtDNA-CN. We incorporate variant annotations to facilitate determination of variant significance. Our pipeline yields uniform coverage by remapping to a circularized chrM and by recovering reads falsely mapped to nuclear-encoded mitochondrial sequences. Notably, we construct a consensus chrM sequence for each sample and recall heteroplasmy against the sample's unique mitochondrial genome. We observe an approximately 3-fold increased association with age for heteroplasmic variants in non-homopolymer regions and, are better able to capture genetic variation in the D-loop of chrM compared to existing software. Our bioinformatics pipeline more accurately captures features of mitochondrial genetics than existing pipelines that are important in understanding how mitochondrial dysfunction contributes to disease.
(© The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.)
Databáze: MEDLINE