High Quality Phasing Using Linked-Read Whole Genome Sequencing of Patient Cohorts Informs Genetic Understanding of Complex Traits

Autor: Scott Mastromatteo, Angela Chen, Jiafen Gong, Fan Lin, Bhooma Thiruvahindrapuram, Wilson WL Sung, Joe Whitney, Zhuozhi Wang, Rohan V Patel, Katherine Keenan, Anat Halevy, Naim Panjwani, Julie Avolio, Cheng Wang, Guillaume Côté-Maurais, Stéphanie Bégin, Damien Adam, Emmanuelle Brochiero, Candice Bjornson, Mark Chilvers, April Price, Michael Parkins, Richard van Wylick, Dimas Mateos-Corral, Daniel Hughes, Mary Jane Smith, Nancy Morrison, Elizabeth Tullis, Anne L Stephenson, Pearce Wilcox, Bradley S Quon, Winnie M Leung, Melinda Solomon, Lei Sun, Felix Ratjen, Lisa J Strug
Rok vydání: 2022
Popis: Phasing of heterozygous alleles is critical for interpretation of cis-effects of disease-relevant variation. For population studies, phase is often inferred from external data but read-based phasing approaches that span long genomic distances would be more accurate because they enable both genotype and phase to be obtained from a single dataset. To demonstrate how read-based phasing can provide functional insights, we sequenced 477 individuals with Cystic Fibrosis (CF) using linked-read sequencing. We benchmark read-based phasing with different short- and long-read sequencing technologies, prioritize linked-read technology as the most informative and produce a benchmark phase call set from reference sample HG002 for the community. The 477 samples display an average phase block N50 of 4.39 Mb. We use these samples to construct a graph representation of CFTR haplotypes, which facilitates understanding of complex CF alleles. Fine-mapping and phasing of the chr7q35 trypsinogen locus associated with CF meconium ileus demonstrates a 20 kb deletion and a PRSS2 missense variant p.Thr8Ile (rs62473563) independently contribute to meconium ileus risk (p=0.0028, p=0.011, respectively) and are PRSS2 pancreas eQTLs (p=9.5e-7 and p=1.4e-4, respectively), explaining the mechanism by which these polymorphisms contribute to CF. Phase enables access to haplotypes that can be used for genome graph or reference panel construction, identification of cis-effects, and for understanding disease associated loci. The phase information from linked-reads provides a causal explanation for variation at a CF-relevant locus which also has implications for the genetic basis of non-CF pancreatitis to which this locus has been reported to contribute.
Databáze: OpenAIRE