HELLO: improved neural network architectures and methodologies for small variant calling
Autor: | Steven S. Lumetta, Eric W. Klee, Anand Ramachandran, Deming Chen |
---|---|
Rok vydání: | 2021 |
Předmět: |
QH301-705.5
Computer science Computer applications to medicine. Medical informatics R858-859.7 Inference Machine learning computer.software_genre Biochemistry Field (computer science) Personalization 03 medical and health sciences 0302 clinical medicine INDEL Mutation Illumina Structural Biology Variant calling Deep neural networks Humans Biology (General) Hybrid variant calling Indel Molecular Biology 030304 developmental biology PacBio 0303 health sciences Artificial neural network business.industry Methodology Article Applied Mathematics Deep learning High-Throughput Nucleotide Sequencing Pipeline (software) Computer Science Applications 030220 oncology & carcinogenesis Neural Networks Computer Artificial intelligence business computer |
Zdroj: | BMC Bioinformatics, Vol 22, Iss 1, Pp 1-31 (2021) BMC Bioinformatics |
ISSN: | 1471-2105 |
DOI: | 10.1186/s12859-021-04311-4 |
Popis: | BackgroundModern Next Generation- and Third Generation- Sequencing methods such as Illumina and PacBio Circular Consensus Sequencing platforms provide accurate sequencing data. Parallel developments in Deep Learning have enabled the application of Deep Neural Networks to variant calling, surpassing the accuracy of classical approaches in many settings. DeepVariant, arguably the most popular among such methods, transforms the problem of variant calling into one of image recognition where a Deep Neural Network analyzes sequencing data that is formatted as images, achieving high accuracy. In this paper, we explore an alternative approach to designing Deep Neural Networks for variant calling, where we use meticulously designed Deep Neural Network architectures and customized variant inference functions that account for the underlying nature of sequencing data instead of converting the problem to one of image recognition.ResultsResults from 27 whole-genome variant calling experiments spanning Illumina, PacBio and hybrid Illumina-PacBio settings suggest that our method allows vastly smaller Deep Neural Networks to outperform the Inception-v3 architecture used in DeepVariant for indel and substitution-type variant calls. For example, our method reduces the number of indel call errors by up to 18%, 55% and 65% for Illumina, PacBio and hybrid Illumina-PacBio variant calling respectively, compared to a similarly trained DeepVariant pipeline. In these cases, our models are between 7 and 14 times smaller.ConclusionsWe believe that the improved accuracy and problem-specific customization of our models will enable more accurate pipelines and further method development in the field. HELLO is available athttps://github.com/anands-repo/hello |
Databáze: | OpenAIRE |
Externí odkaz: |