Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches
Autor: | Stephanie N Kravitz, Hunter R. Underhill, Randy L. Jensen, Brent S. Pedersen, Mary P. Bronner, Preetida J. Bhetariya, Aaron R. Quinlan, Joseph Brown, Gabor T. Marth |
---|---|
Rok vydání: | 2020 |
Předmět: |
0301 basic medicine
lcsh:QH426-470 Systems biology DNA Mutational Analysis lcsh:Medicine Sample (statistics) Computational biology Web Browser Biology Genome Germline 03 medical and health sciences 0302 clinical medicine Neoplasms Genetic variation Genetics Humans 1000 Genomes Project Molecular Biology Genetics (clinical) Genome Human lcsh:R Computational Biology Genetic Variation High-Throughput Nucleotide Sequencing Genomics Sequence Analysis DNA Human genetics lcsh:Genetics Germ Cells 030104 developmental biology Molecular Medicine Pairwise comparison Software Algorithms 030217 neurology & neurosurgery |
Zdroj: | Genome Medicine Genome Medicine, Vol 12, Iss 1, Pp 1-9 (2020) |
ISSN: | 1756-994X |
DOI: | 10.1186/s13073-020-00761-2 |
Popis: | Background When interpreting sequencing data from multiple spatial or longitudinal biopsies, detecting sample mix-ups is essential, yet more difficult than in studies of germline variation. In most genomic studies of tumors, genetic variation is detected through pairwise comparisons of the tumor and a matched normal tissue from the sample donor. In many cases, only somatic variants are reported, which hinders the use of existing tools that detect sample swaps solely based on genotypes of inherited variants. To address this problem, we have developed Somalier, a tool that operates directly on alignments and does not require jointly called germline variants. Instead, Somalier extracts a small sketch of informative genetic variation for each sample. Sketches from hundreds of germline or somatic samples can then be compared in under a second, making Somalier a useful tool for measuring relatedness in large cohorts. Somalier produces both text output and an interactive visual report that facilitates the detection and correction of sample swaps using multiple relatedness metrics. Results We introduce the tool and demonstrate its utility on a cohort of five glioma samples each with a normal, tumor, and cell-free DNA sample. Applying Somalier to high-coverage sequence data from the 1000 Genomes Project also identifies several related samples. We also demonstrate that it can distinguish pairs of whole-genome and RNA-seq samples from the same individuals in the Genotype-Tissue Expression (GTEx) project. Conclusions Somalier is a tool that can rapidly evaluate relatedness from sequencing data. It can be applied to diverse sequencing data types and genome builds and is available under an MIT license at github.com/brentp/somalier. |
Databáze: | OpenAIRE |
Externí odkaz: |