High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function.

Autor: Gao M; Georgia Institute of Technology, Atlanta, GA., Lund-Andersen P; University of Idaho, Moscow, ID., Morehead A; University of Missouri, Columbia, MO., Mahmud S; University of Missouri, Columbia, MO., Chen C; University of Missouri, Columbia, MO., Chen X; University of Missouri, Columbia, MO., Giri N; University of Missouri, Columbia, MO., Roy RS; University of Missouri, Columbia, MO., Quadir F; University of Missouri, Columbia, MO., Effler TC; Oak Ridge National Laboratory, Oak Ridge, TN., Prout R; Oak Ridge National Laboratory, Oak Ridge, TN., Abraham S; Oak Ridge National Laboratory, Oak Ridge, TN., Elwasif W; Oak Ridge National Laboratory, Oak Ridge, TN., Haas NQ; Oak Ridge National Laboratory, Oak Ridge, TN., Skolnick J; Georgia Institute of Technology, Atlanta, GA., Cheng J; University of Missouri, Columbia, MO., Sedova A; Oak Ridge National Laboratory, Oak Ridge, TN.
Jazyk: angličtina
Zdroj: Workshop on Machine Learning in HPC Environments. Workshop on Machine Learning in HPC Environments [Workshop Mach Learn HPC Environ] 2021 Nov; Vol. 2021, pp. 46-57. Date of Electronic Publication: 2021 Dec 27.
DOI: 10.1109/mlhpc54614.2021.00010
Abstrakt: Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.
Databáze: MEDLINE