SoluProt: prediction of soluble protein expression in Escherichia coli
Autor: | Jiri Hon, Tomáš Martínek, David Bednar, Jiri Damborsky, Antonin Kunka, Martin Marusiak, Jaroslav Zendulka |
---|---|
Rok vydání: | 2021 |
Předmět: |
Statistics and Probability
Prioritization AcademicSubjects/SCI01060 Computer science SOLUBILITY WEBSERVER TOPOLOGY ACCURATE Sequence alignment Computational biology medicine.disease_cause Biochemistry Protein expression 03 medical and health sciences medicine Solubility Molecular Biology Escherichia coli 030304 developmental biology 0303 health sciences Training set Chemistry 030302 biochemistry & molecular biology Biological activity Original Papers Computer Science Applications Computational Mathematics Computational Theory and Mathematics Test set Gradient boosting Protein solubility Sequence Analysis |
Zdroj: | Bioinformatics |
ISSN: | 1367-4811 1367-4803 |
Popis: | Motivation Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritization of highly soluble proteins. Results A new tool for sequence-based prediction of soluble protein expression in E.coli, SoluProt, was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProt’s accuracy of 58.5% and AUC of 0.62 exceeded those of a suite of alternative solubility prediction tools. There is also evidence that it could significantly increase the success rate of experimental protein studies. SoluProt is freely available as a standalone program and a user-friendly webserver at https://loschmidt.chemi.muni.cz/soluprot/. Availability and implementation https://loschmidt.chemi.muni.cz/soluprot/. Supplementary information Supplementary data are available at Bioinformatics online. |
Databáze: | OpenAIRE |
Externí odkaz: |