A Pangenome Reference of 36 Chinese populations

Autor: Shuhua Xu, Yang Gao, Xiaofei Yang, Hao Chen, Xinjiang Tan, Zhaoqing Yang, Lian Deng, Yimin Wang, Baonan Wang, Songyang Li, Yuhang Cui, Yuwen Pan, Sen Ma, Hao Sun, Shuang Kong, Xiaohan Zhao, Dongdong Wu, Shaoyuan Wu, Bingyin Shi, Li Jin, Yan Lu, Jiayou Chu, Kai Ye
Rok vydání: 2022
DOI: 10.21203/rs.3.rs-2097264/v1
Popis: Human genomics is witnessing an ongoing paradigm shift from a single reference sequence to a pangenome form but populations of Asian ancestry are underrepresented. Here, we present the first effort (Phase I) of the Chinese Pangenome Consortium (CPC) with a collection of 116 high-quality and haplotype-phased de novo assemblies based on 58 core samples representing 36 minority Chinese ethnic groups. With > 30.65× High-Fidelity long-reads sequence coverage, an average contiguity N50 > 35.63 Mb, and an average total size of 3.01 Gb, the CPC core assemblies cover ~96.54% and ~93.59% of the latest reference sequence GRCh38 and a Telomere-to-Telomere haploid assembly T2T-CHM13, respectively. Moreover, the CPC Phase I data add 189 million base pairs of euchromatic polymorphic sequence and 1,367 protein-coding gene duplications to GRCh38. We also identify from the CPC pangenome ~15.9 million small variants and ~78 thousand structural variants (SVs), of which ~6.1 million (38.0%) small variants and ~25 thousand (32.4%) SVs are not reported in a recently released pangenome reference by the Human Pangenome Reference Consortium (HPRC)1. The CPC data demonstrate a remarkable increase in discovering novel or missing sequences when individuals are included from underrepresented minority ethnic groups, suggesting the necessity of a more comprehensive sampling effort for both CPC and HPRC.
Databáze: OpenAIRE