Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction.

Autor: Zeng S; Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA., Wang D; Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA., Jiang L; Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA., Xu D; Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA xudong@missouri.edu.
Jazyk: angličtina
Zdroj: Genome research [Genome Res] 2024 Oct 11; Vol. 34 (9), pp. 1445-1454. Date of Electronic Publication: 2024 Oct 11.
DOI: 10.1101/gr.279132.124
Abstrakt: Signal peptides (SPs) play a crucial role in protein translocation in cells. The development of large protein language models (PLMs) and prompt-based learning provide a new opportunity for SP prediction, especially for the categories with limited annotated data. We present a parameter-efficient fine-tuning (PEFT) framework for SP prediction, PEFT-SP, to effectively utilize pretrained PLMs. We integrated low-rank adaptation (LoRA) into ESM-2 models to better leverage the protein sequence evolutionary knowledge of PLMs. Experiments show that PEFT-SP using LoRA enhances state-of-the-art results, leading to a maximum Matthews correlation coefficient (MCC) gain of 87.3% for SPs with small training samples and an overall MCC gain of 6.1%. Furthermore, we also employed two other PEFT methods, prompt tuning and adapter tuning, in ESM-2 for SP prediction. More elaborate experiments show that PEFT-SP using adapter tuning can also improve the state-of-the-art results by up to 28.1% MCC gain for SPs with small training samples and an overall MCC gain of 3.8%. LoRA requires fewer computing resources and less memory than the adapter tuning during the training stage, making it possible to adapt larger and more powerful protein models for SP prediction.
(© 2024 Zeng et al.; Published by Cold Spring Harbor Laboratory Press.)
Databáze: MEDLINE