Effect of sequence depth and length in long-read assembly of the maize inbred NC358
Autor: | R. Kelly Dawe, Jianing Liu, Kapeel Chougule, Adam M. Phillippy, Chen-Shan Chin, Sharon Wei, Brian P. Walenz, Sergey Koren, Samantha J. Snodgrass, Brett T. Hannigan, Joshua C. Stein, Arkarachai Fungtammasan, Nancy Manchanda, Arun S. Seetharam, Margaret R. Woodhouse, Kevin Fengler, Sarah Pedersen, Candice N. Hirsch, Shujun Ou, Matthew B. Hufford, Doreen Ware, Victor Llaca, Amanda M. Gilbert, David E. Hufnagel |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
0301 basic medicine
0106 biological sciences Transposable element Agricultural genetics Computer science Heterochromatin Science General Physics and Astronomy Sequence assembly Computational biology Biology 01 natural sciences Genome Zea mays General Biochemistry Genetics and Molecular Biology Article 03 medical and health sciences Centromere Genome assembly algorithms Resource allocation (computer) Inbreeding lcsh:Science Gene 030304 developmental biology Sequence (medicine) Repetitive Sequences Nucleic Acid 2. Zero hunger 0303 health sciences Multidisciplinary Base Sequence High-Throughput Nucleotide Sequencing General Chemistry 030104 developmental biology DNA Transposable Elements lcsh:Q Line (text file) Limited resources Genome Plant 010606 plant biology & botany |
Zdroj: | Nature Communications, Vol 11, Iss 1, Pp 1-10 (2020) Nature Communications |
ISSN: | 2041-1723 |
Popis: | Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature. Sequence depth and read length determine the quality of genome assembly. Here, the authors leverage a set of PacBio reads to develop guidelines for sequencing and assembly of complex plant genomes in order to allocate finite resources using maize as an example. |
Databáze: | OpenAIRE |
Externí odkaz: |