Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets
Autor: | Sarah K. Nyquist, Ibrahim Numanagić, Deniz Yorukoglu, Emily Berger, Bonnie Berger, Manolis Kellis, Lillian Zhang, Alex K. Shalek |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Science
General Physics and Astronomy Genomics Computational biology Allelic Imbalance Biology Polymorphism Single Nucleotide Genome Article General Biochemistry Genetics and Molecular Biology Polyploidy 03 medical and health sciences 0302 clinical medicine Polyploid Chromosome (genetic algorithm) Genome assembly algorithms Databases Genetic Computational models Humans RNA-Seq lcsh:Science Exome Gene 030304 developmental biology 0303 health sciences Models Statistical Multidisciplinary Models Genetic Sequence Analysis RNA Haplotype General Chemistry Diploidy Data processing Sequence annotation Haplotypes lcsh:Q K562 Cells Software Algorithms 030217 neurology & neurosurgery |
Zdroj: | Nature Communications, Vol 11, Iss 1, Pp 1-9 (2020) Nature Communications |
ISSN: | 2041-1723 |
DOI: | 10.1038/s41467-020-18320-z |
Popis: | Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X’s feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10× faster than other tools. The advantage of HapTree-X’s ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases. Haplotype reconstruction of distant genetic variants is problematic in short-read sequencing. Here, the authors describe HapTree-X, a probabilistic framework that uses differential allele-specific expression to better reconstruct paternal haplotypes from diploid and polyploid genomes. |
Databáze: | OpenAIRE |
Externí odkaz: |