Accelerating the Alignment Processing Speed of the Comprehensive End-to-End Whole-Genome Bisulfite Sequencing Pipeline, wg-blimp

Autor: Jake D. Lehle, John R. McCarrey
Rok vydání: 2022
Popis: Background Analyzing whole-genome bisulfite sequencing (WGBS) datasets is a time-intensive process due to the complexity and size of the input raw sequencing files and lengthy read alignment step during downstream data processing. This is particularly challenging with WGBS data because the conversion of all unmethylated Cs to Ts genome-wide renders read alignment a cumbersome computational process that can take up to a full work week of computing time. The objective of the study described here was to modify the read alignment algorithm associated with the wg-blimp pipeline to shorten the time required to complete this phase while retaining overall read alignment accuracy. Results Here we report improvements upon the recently published pipeline wg-blimp (whole-genome bisulfite sequencing methylation analysis pipeline) achieved by replacing the use of the bwa-meth aligner with the faster gemBS aligner. This improvement to the wg-blimp pipeline has led to a > 7x acceleration in the processing speed of samples when scaled to larger publicly available FASTQ datasets containing 80–160 million (M) reads. Importantly, this acceleration was achieved while maintaining nearly identical accuracy of properly mapped reads when compared to data from the original pipeline. Conclusion The modifications to the wg-blimp pipeline reported here merge the speed and accuracy of the gemBS aligner with the comprehensive analysis and data visualization assets of the wg-blimp pipeline to provide a significantly accelerated pipeline that can produce high-quality data much more rapidly without compromising read accuracy.
Databáze: OpenAIRE