Autor: |
Changge Guan, Jiawei Luo, Shucheng Li, Zheng Lin Tan, Jiahao Li, Zourun Wu, Yi Wang, Haihong Chen, Naoyuki Yamamoto, Chong Zhang, Yuan Lu, Junjie Chen, Xin-Hui Xing |
Rok vydání: |
2022 |
Popis: |
Dipeptidyl peptidase IV (DPP-IV) inhibitory peptides (DPP-IV-IPs) are next generation anti-diabetic drugs. Only a limited number of DPP-IV-IPs have been discovered because there are few efficient tools for peptide mining. In this study, we propose a peptide language model (PLM) to identify DPP-IV-IPs from large-scale peptide datasets and achieve state-of-the-art accuracy of 0.894. Visualization of the model’s attention has shown that the model can automatically learn information about physicochemical and structural properties from peptide sequences, i.e., DPP-IV cleavage sites, and can distinguish and represent amino acid sequences in high-dimensional spaces. We demonstrated that this PLM could capture cleavage site information and can guide biological experimental screening, which has not been previously reported. Based on follow-up biological assays, the prediction accuracy of the PLM-assisted DPP-IV-IP screening was 90%. To explore the diversity of DPP-IV-IPs, we proposed a strategy based on the dipeptide repeat unit X-proline and showed that the strategy is feasible through modeling and biological experiments. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|