Fast-bonito: A faster deep learning based basecaller for nanopore sequencing

Autor: Nan Qiao, Lei Zhang, Wenjun He, Walid Abbas Zaher, Xinyuan Lin, Zhimeng Xu, Ashish Koshy, Xin Meng, Yuting Mai, Joseph Mafofo, Yi Li, Chi Xu, Denghui Liu
Rok vydání: 2021
Předmět:
Zdroj: Artificial Intelligence in the Life Sciences, Vol 1, Iss, Pp 100011-(2021)
ISSN: 2667-3185
DOI: 10.1016/j.ailsci.2021.100011
Popis: Nanopore sequencing from Oxford Nanopore Technologies (ONT) is a promising third-generation sequencing (TGS) technology that generates relatively longer sequencing reads compared to the next-generation sequencing (NGS) technology. A basecaller is a piece of software that translates the original electrical current signals into nucleotide sequences. The accuracy of the basecaller is crucially important to downstream analysis. Bonito is a deep learning-based basecaller recently developed by ONT. Its neural network architecture is composed of a single convolutional layer followed by three stacked bidirectional gated recurrent unit (GRU) layers. Although Bonito has achieved state-of-the-art base calling accuracy, its speed is too slow to be used in production. We therefore developed Fast-Bonito, by using the neural architecture search (NAS) technique to search for a brand-new neural network backbone, and trained it from scratch using several advanced deep learning model training techniques. The new Fast-Bonito model balanced performance in terms of speed and accuracy. Fast-Bonito was 153.8% faster than the original Bonito on NVIDIA V100 GPU. When running on HUAWEI Ascend 910 NPU, Fast-Bonito was 565% faster than the original Bonito. The accuracy of Fast-Bonito was also slightly higher than that of Bonito. We have made Fast-Bonito open source, hoping it will boost the adoption of TGS in both academia and industry.
Databáze: OpenAIRE