Assembly of a pan-genome from deep sequencing of 910 humans of African descent
Autor: | Steven L. Salzberg, Nadia N. Hansel, Albert M. Levin, Candelaria Vergara, Monica Campbell, Kathleen C. Barnes, Valentin Antonescu, Alvaro Mayorga, Victor E. Ortega, Esteban G. Burchard, Edwin Francisco Herrera-Paz, Cassandra Foster, Javier Marrugo, Michelle Daya, Margaret A. Taub, Christopher O. Olopade, Georgia M. Dunston, Marilyn G. Foreman, Mezbah U. Faruque, Carole Ober, Eugene R. Bleecker, Jennifer Knight-Madden, Rasika A. Mathias, Sameer Chavan, Deborah A. Meyers, Dan L. Nicolae, Lorraine B. Ware, Maria Yazdanbakhsh, Ingo Ruczinski, Celeste Eng, Daniela Puiu, Terri H. Beaty, L. Keoki Williams, Harold Watson, Nicholas Rafaels, James G. Wilson, Leslie A. Lange, Tina V. Hartert, Olufunmilayo I. Olopade, Maria Ilma Araujo, Luis Caraballo, Juliet Forman, Rachel M. Sherman, Ricardo Riccio Oliveira, Meher Preethi Boorgula, Jean G. Ford |
---|---|
Rok vydání: | 2017 |
Předmět: |
0303 health sciences
Contig Genome Human food and beverages Black People High-Throughput Nucleotide Sequencing Genomics Computational biology Sequence Analysis DNA Biology Genome DNA sequencing 03 medical and health sciences 0302 clinical medicine Intergenic region Gene mapping Genetics Humans Human genome 030217 neurology & neurosurgery 030304 developmental biology Reference genome |
Zdroj: | Nature Genetics, 51(1), 30 |
ISSN: | 1546-1718 |
Popis: | We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic. Assembly of a pan-genome from 910 humans of African descent identifies 296.5 Mb of novel DNA mapping to 125,715 distinct contigs. This African pan-genome contains ~10% more DNA than the current human reference genome. |
Databáze: | OpenAIRE |
Externí odkaz: |