CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides.

Autor: Ouellet S; Ottawa, Canada., Ferguson L; Neurobiology Division, MRC Laboratory of Molecular Biology, Cambridge, UK., Lau AZ; Medical Biophysics, University of Toronto, Toronto, ON, Canada.; Physical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, Canada., Lim TKY; Vancouver, Canada. tl581@cam.ac.uk.; Department of Pharmacology, University of Cambridge, Cambridge, UK. tl581@cam.ac.uk.
Jazyk: angličtina
Zdroj: BMC bioinformatics [BMC Bioinformatics] 2023 May 16; Vol. 24 (1), pp. 200. Date of Electronic Publication: 2023 May 16.
DOI: 10.1186/s12859-023-05327-8
Abstrakt: Background: Cysteine-dense peptides (CDPs) are an attractive pharmaceutical scaffold that display extreme biochemical properties, low immunogenicity, and the ability to bind targets with high affinity and selectivity. While many CDPs have potential and confirmed therapeutic uses, synthesis of CDPs is a challenge. Recent advances have made the recombinant expression of CDPs a viable alternative to chemical synthesis. Moreover, identifying CDPs that can be expressed in mammalian cells is crucial in predicting their compatibility with gene therapy and mRNA therapy. Currently, we lack the ability to identify CDPs that will express recombinantly in mammalian cells without labour intensive experimentation. To address this, we developed CysPresso, a novel machine learning model that predicts recombinant expression of CDPs based on primary sequence.
Results: We tested various protein representations generated by deep learning algorithms (SeqVec, proteInfer, AlphaFold2) for their suitability in predicting CDP expression and found that AlphaFold2 representations possessed the best predictive features. We then optimized the model by concatenation of AlphaFold2 representations, time series transformation with random convolutional kernels, and dataset partitioning.
Conclusion: Our novel model, CysPresso, is the first to successfully predict recombinant CDP expression in mammalian cells and is particularly well suited for predicting recombinant expression of knottin peptides. When preprocessing the deep learning protein representation for supervised machine learning, we found that random convolutional kernel transformation preserves more pertinent information relevant for predicting expressibility than embedding averaging. Our study showcases the applicability of deep learning-based protein representations, such as those provided by AlphaFold2, in tasks beyond structure prediction.
(© 2023. The Author(s).)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje