Reconstructing phylogenetic trees from genome-wide somatic mutations in clonal samples.

Autor: Coorens THH; Wellcome Sanger Institute, Hinxton, UK. tcoorens@broadinstitute.org.; Broad Institute of MIT and Harvard, Cambridge, MA, USA. tcoorens@broadinstitute.org., Spencer Chapman M; Wellcome Sanger Institute, Hinxton, UK. ms56@sanger.ac.uk.; Department of Haematology, Barts Health NHS Trust, London, UK. ms56@sanger.ac.uk.; Department of Haemato-oncology, Barts Cancer Institute, Queen Mary University of London, London, UK. ms56@sanger.ac.uk., Williams N; Wellcome Sanger Institute, Hinxton, UK., Martincorena I; Wellcome Sanger Institute, Hinxton, UK., Stratton MR; Wellcome Sanger Institute, Hinxton, UK., Nangalia J; Wellcome Sanger Institute, Hinxton, UK.; Wellcome-Medical Research Council Cambridge Stem Cell Institute, Cambridge, UK.; Department of Haematology, University of Cambridge, Cambridge, UK., Campbell PJ; Wellcome Sanger Institute, Hinxton, UK. pc8@sanger.ac.uk.; Wellcome-Medical Research Council Cambridge Stem Cell Institute, Cambridge, UK. pc8@sanger.ac.uk.
Jazyk: angličtina
Zdroj: Nature protocols [Nat Protoc] 2024 Jun; Vol. 19 (6), pp. 1866-1886. Date of Electronic Publication: 2024 Feb 23.
DOI: 10.1038/s41596-024-00962-8
Abstrakt: Phylogenetic trees are a powerful means to display the evolutionary history of species, pathogens and, more recently, individual cells of the human body. Whole-genome sequencing of laser capture microdissections or expanded stem cells has allowed the discovery of somatic mutations in clones, which can be used as natural barcodes to reconstruct the developmental history of individual cells. Here we describe Sequoia, our pipeline to reconstruct lineage trees from clones of normal cells. Candidate somatic mutations are called against the human reference genome and filtered to exclude germline mutations and artifactual variants. These filtered somatic mutations form the basis for phylogeny reconstruction using a maximum parsimony framework. Lastly, we use a maximum likelihood framework to explicitly map mutations to branches in the phylogenetic tree. The resulting phylogenies can then serve as a basis for many subsequent analyses, including investigating embryonic development, tissue dynamics in health and disease, and mutational signatures. Sequoia can be readily applied to any clonal somatic mutation dataset, including single-cell DNA sequencing datasets, using the commands and scripts provided. Moreover, Sequoia is highly flexible and can be easily customized. Typically, the runtime of the core script ranges from minutes to an hour for datasets with a moderate number (50,000-150,000) of variants. Competent bioinformatic skills, including in-depth knowledge of the R programming language, are required. A high-performance computing cluster (one that is capable of running mutation-calling algorithms and other aspects of the analysis at scale) is also required, especially if handling large datasets.
(© 2024. Springer Nature Limited.)
Databáze: MEDLINE