Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets

Autor: Sarah K. Nyquist, Ibrahim Numanagić, Deniz Yorukoglu, Emily Berger, Bonnie Berger, Manolis Kellis, Lillian Zhang, Alex K. Shalek
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: Nature Communications, Vol 11, Iss 1, Pp 1-9 (2020)
Nature Communications
ISSN: 2041-1723
DOI: 10.1038/s41467-020-18320-z
Popis: Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X’s feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10× faster than other tools. The advantage of HapTree-X’s ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases.
Haplotype reconstruction of distant genetic variants is problematic in short-read sequencing. Here, the authors describe HapTree-X, a probabilistic framework that uses differential allele-specific expression to better reconstruct paternal haplotypes from diploid and polyploid genomes.
Databáze: OpenAIRE