Enhancing Arabidopsis thaliana ubiquitination site prediction through knowledge distillation and natural language processing.
Autor: | Nguyen VN; University of Information and Communication Technology, Thai Nguyen University, Thai Nguyen, Viet Nam., Tran TX; University of Economics and Business Administration, Thai Nguyen University, Thai Nguyen, Viet Nam. Electronic address: tranxuantbhd@tueba.edu.vn., Nguyen TT; University of Information and Communication Technology, Thai Nguyen University, Thai Nguyen, Viet Nam., Le NQK; In-Service Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan; AIBioMed Research Group, Taipei Medical University, Taipei 110, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan. Electronic address: khanhlee@tmu.edu.tw. |
---|---|
Jazyk: | angličtina |
Zdroj: | Methods (San Diego, Calif.) [Methods] 2024 Dec; Vol. 232, pp. 65-71. Date of Electronic Publication: 2024 Oct 22. |
DOI: | 10.1016/j.ymeth.2024.10.006 |
Abstrakt: | Protein ubiquitination is a critical post-translational modification (PTM) involved in diverse biological processes and plays a pivotal role in regulating physiological mechanisms and disease states. Despite various efforts to develop ubiquitination site prediction tools across species, these tools mainly rely on predefined sequence features and machine learning algorithms, with species-specific variations in ubiquitination patterns remaining poorly understood. This study introduces a novel approach for predicting Arabidopsis thaliana ubiquitination sites using a neural network model based on knowledge distillation and natural language processing (NLP) of protein sequences. Our framework employs a multi-species "Teacher model" to guide a more compact, species-specific "Student model", with the "Teacher" generating pseudo-labels that enhance the "Student" learning and prediction robustness. Cross-validation results demonstrate that our model achieves superior performance, with an accuracy of 86.3 % and an area under the curve (AUC) of 0.926, while independent testing confirmed these results with an accuracy of 86.3 % and an AUC of 0.923. Comparative analysis with established predictors further highlights the model's superiority, emphasizing the effectiveness of integrating knowledge distillation and NLP in ubiquitination prediction tasks. This study presents a promising and efficient approach for ubiquitination site prediction, offering valuable insights for researchers in related fields. The code and resources are available on GitHub: https://github.com/nuinvtnu/KD_ArapUbi. Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. (Copyright © 2024 Elsevier Inc. All rights reserved.) |
Databáze: | MEDLINE |
Externí odkaz: |