AnnoPRO: an Innovative Strategy for Protein Function Annotation based on Image-like Protein Representation and Multimodal Deep Learning

Autor: Lingyan Zheng, Shuiyang Shi, Pan Fang, Hongning Zhang, Ziqi Pan, Shijie Huang, Weiqi Xia, Honglin Li, Zhenyu Zeng, Shun Zhang, Yuzong Chen, Mingkun Lu, Zhaorong Li, Feng Zhu
Rok vydání: 2023
Popis: Protein function annotation has been one of the longstanding issues, which is key for discovering drug targets and understanding physiological or pathological process. A variety of computational methods have therefore been constructed to facilitate the research developments in this particular direction. However, the annotation of protein function based on computational methods has been suffering from the serious “long-tail problem”, and it remains extremely challenging for existing methods to improve the prediction accuracies for protein families intail label levels. In this study, an innovative strategy, entitled ‘AnnoPRO’, for protein function annotation was thus constructed.First, a novel method enabling image-like protein representations was proposed. This method is unique in capturing the intrinsic correlations among protein features, which can greatly favor the application of thestate-of-the-artdeep learning methods popular in image classification.Second, a multimodal framework integrating multichannel convolutional neural network and long short-term memory neural network was constructed to realize a deep learning-based protein functional annotation. Since this framework was inspired by a reputable method used in image classification for dealing with its ‘long-tail problem’, ourAnnoPROwas expected to significantly improve the annotation performance of the protein families intail label level. Multiple case studies based on benchmark were also conducted, which confirmed the superior performance ofAnnoPROamong the existing methods. All source codes and models ofAnnoPROwere freely available to all users athttps://github.com/idrblab/AnnoPRO, and would be essential complement to existing methods.
Databáze: OpenAIRE