LMPTMSite: A Platform for PTM Site Prediction in Proteins Leveraging Transformer-Based Protein Language Models.
Autor: | Pratyush P; Computer Science Department, Rochester Institute of Technology, Rochester, NY, USA., Pokharel S; Computer Science Department, Rochester Institute of Technology, Rochester, NY, USA., Ismail HD; Computer Science Department, Rochester Institute of Technology, Rochester, NY, USA.; North Carolina A&T State University, Computational Data Science and Engineering, Greensboro, NC, USA., Bahmani S; Computer Science Department, Rochester Institute of Technology, Rochester, NY, USA.; Michigan Technological University, Comptuer Science Department, Houghton, MI, USA., Kc DB; Computer Science Department, Rochester Institute of Technology, Rochester, NY, USA. dkcvcs@rit.edu. |
---|---|
Jazyk: | angličtina |
Zdroj: | Methods in molecular biology (Clifton, N.J.) [Methods Mol Biol] 2025; Vol. 2867, pp. 261-297. |
DOI: | 10.1007/978-1-0716-4196-5_16 |
Abstrakt: | Protein post-translational modifications (PTMs) introduce new functionalities and play a critical role in the regulation of protein functions. Characterizing these modifications, especially PTM sites, is essential for unraveling complex biological systems. However, traditional experimental approaches, such as mass spectrometry, are time-consuming and expensive. Machine learning and deep learning techniques offer promising alternatives for predicting PTM sites. In this chapter, we introduce our LMPTMSite (language model-based post-translational modification site predictor) platform, which emphasizes two transformer-based protein language model (pLM) approaches: pLMSNOSite and LMSuccSite, for the prediction of S-nitrosylation sites and succinylation sites in proteins, respectively. We highlight the various methods of using pLM-based sequence encoding, explain the underlying deep learning architectures, and discuss the superior efficacy of these tools compared to other state-of-the-art tools. Subsequently, we present an analysis of runtime and memory usage for pLMSNOSite, with a focus on CPU and RAM usage as the input sequence length is scaled up. Finally, we showcase a case study predicting succinylation sites in proteins active within the tricarboxylic acid (TCA) cycle pathway using LMSuccSite, demonstrating its potential utility and efficiency in real-world biological contexts. The LMPTMSite platform, inclusive of pLMSNOSite and LMSuccSite, is freely available both as a web server ( http://kcdukkalab.org/pLMSNOSite/ and http://kcdukkalab.org/LMSuccSite/ ) and as standalone packages ( https://github.com/KCLabMTU/pLMSNOSite and https://github.com/KCLabMTU/LMSuccSite ), providing valuable tools for researchers in the field. (© 2025. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.) |
Databáze: | MEDLINE |
Externí odkaz: |