GORetriever: reranking protein-description-based GO candidates by literature-driven deep information retrieval for protein function annotation.

Autor: Yan H; Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China., Wang S; Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China., Liu H; Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China., Mamitsuka H; Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto Prefecture 611-0011, Japan.; Department of Computer Science, Aalto University, Espoo 00076, Finland., Zhu S; Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China.; Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, 200433, China.; Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University, Shanghai, 200433, China.; Zhangjiang Fudan International Innovation Center, Shanghai, 200433, China.
Jazyk: angličtina
Zdroj: Bioinformatics (Oxford, England) [Bioinformatics] 2024 Sep 01; Vol. 40 (Suppl 2), pp. ii53-ii61.
DOI: 10.1093/bioinformatics/btae401
Abstrakt: Summary: The vast majority of proteins still lack experimentally validated functional annotations, which highlights the importance of developing high-performance automated protein function prediction/annotation (AFP) methods. While existing approaches focus on protein sequences, networks, and structural data, textual information related to proteins has been overlooked. However, roughly 82% of SwissProt proteins already possess literature information that experts have annotated. To efficiently and effectively use literature information, we present GORetriever, a two-stage deep information retrieval-based method for AFP. Given a target protein, in the first stage, candidate Gene Ontology (GO) terms are retrieved by using annotated proteins with similar descriptions. In the second stage, the GO terms are reranked based on semantic matching between the GO definitions and textual information (literature and protein description) of the target protein. Extensive experiments over benchmark datasets demonstrate the remarkable effectiveness of GORetriever in enhancing the AFP performance. Note that GORetriever is the key component of GOCurator, which has achieved first place in the latest critical assessment of protein function annotation (CAFA5: over 1600 teams participated), held in 2023-2024.
Availability and Implementation: GORetriever is publicly available at https://github.com/ZhuLab-Fudan/GORetriever.
(© The Author(s) 2024. Published by Oxford University Press.)
Databáze: MEDLINE