MACHINE LEARNING-BASED APPROACHES FOR ACCURATE PROTEIN STRUCTURE CLASSIFICATION AND ASSEMBLY

Autor: Aderinwale, Tunde W
Rok vydání: 2023
Předmět:
DOI: 10.25394/pgs.22144964
Popis: Proteins play a vital role in the functioning of living cells, and their analysis is crucial for gaining deeper insights into cellular processes. Determining their three-dimensional (3D) structures is a critical aspect of understanding proteins. These is important for developing new drugs, understanding disease processes, and for many other applications. In this thesis, we propose a novel and innovative approach for comparing 3D protein structures. Our approach uses a deep neural network to accurately identify proteins with similar structures. Rapid comparison of two proteins or a single protein against a database of millions of structures can be done to quickly identify similar structures and rank them within minutes, with improved accuracy over existing methods. Additionally, we have developed a method for assembling protein structures by combining Reinforcement Learning (RL) and LZerD. Our method formulates the assembly of protein complexes as an episode in the RL framework and uses precomputed pairwise models from LZerD to combine the chains until the complete complex is assembled. Our approach has shown improved results compared to similar methods for protein complex with 3-5 chains. To further enhance our approach, we replaced LZerD with AlphaFold-Multimer and applied it to predict the structure of proteins with 6-15 chains, There are few methods that are capable of this type of large scale protein complex assembly. Using AlphaFold-Multimer, we are able to generate and sample high-quality subcomponents, reducing the action space for the RL agent from 1,000 to 75, resulting in improved performance over similar methods and demonstrating the potential of our approach for analyzing large protein complexes.
Databáze: OpenAIRE