HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly.

Autor: Sim SB; USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, 64 Nowelo Street, Hilo, HI, 96720, USA. sheina.sim@usda.gov., Corpuz RL; USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, 64 Nowelo Street, Hilo, HI, 96720, USA., Simmonds TJ; USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, 64 Nowelo Street, Hilo, HI, 96720, USA.; Oak Ridge Institute for Science and Education, Oak Ridge Associated Universities, Oak Ridge, TN, 37830, USA., Geib SM; USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, 64 Nowelo Street, Hilo, HI, 96720, USA.
Jazyk: angličtina
Zdroj: BMC genomics [BMC Genomics] 2022 Feb 22; Vol. 23 (1), pp. 157. Date of Electronic Publication: 2022 Feb 22.
DOI: 10.1186/s12864-022-08375-1
Abstrakt: Background: Pacific Biosciences HiFi read technology is currently the industry standard for high accuracy long-read sequencing that has been widely adopted by large sequencing and assembly initiatives for generation of de novo assemblies in non-model organisms. Though adapter contamination filtering is routine in traditional short-read analysis pipelines, it has not been widely adopted for HiFi workflows.
Results: Analysis of 55 publicly available HiFi datasets revealed that a read-sanitation step to remove sequence artifacts derived from PacBio library preparation from read pools is necessary as adapter sequences can be erroneously integrated into assemblies.
Conclusions: Here we describe the nature of adapter contaminated reads, their consequences in assembly, and present HiFiAdapterFilt, a simple and memory efficient solution for removing adapter contaminated reads prior to assembly.
(© 2022. The Author(s).)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje