Reviewing Challenges of Predicting Protein Melting Temperature Change Upon Mutation Through the Full Analysis of a Highly Detailed Dataset with High-Resolution Structures
Autor: | Benjamin B. V. Louis, Luciano A. Abriata |
---|---|
Rok vydání: | 2021 |
Předmět: |
Protein Folding
Interface (Java) Reliability (computer networking) Stability (learning theory) Bioengineering Context (language use) Review 010402 general chemistry computer.software_genre ENCODE 01 natural sciences Applied Microbiology and Biotechnology Biochemistry Machine Learning 03 medical and health sciences Transition Temperature Databases Protein Molecular Biology 030304 developmental biology Structure (mathematical logic) 0303 health sciences Artificial neural network Protein Stability Computational Biology Proteins 0104 chemical sciences Amino Acid Substitution Structural biology Mutagenesis Mutation Protein engineering Data mining Protein design computer Biotechnology |
Zdroj: | Molecular Biotechnology |
ISSN: | 1559-0305 1073-6085 |
Popis: | Predicting the effects of mutations on protein stability is a key problem in fundamental and applied biology, still unsolved even for the relatively simple case of small, soluble, globular, monomeric, two-state-folder proteins. Many articles discuss the limitations of prediction methods and of the datasets used to train them, which result in low reliability for actual applications despite globally capturing trends. Here, we review these and other issues by analyzing one of the most detailed, carefully curated datasets of melting temperature change (ΔTm) upon mutation for proteins with high-resolution structures. After examining the composition of this dataset to discuss imbalances and biases, we inspect several of its entries assisted by an online app for data navigation and structure display and aided by a neural network that predicts ΔTm with accuracy close to that of programs available to this end. We pose that the ΔTm predictions of our network, and also likely those of other programs, account only for a baseline-like general effect of each type of amino acid substitution which then requires substantial corrections to reproduce the actual stability changes. The corrections are very different for each specific case and arise from fine structural details which are not well represented in the dataset and which, despite appearing reasonable upon visual inspection of the structures, are hard to encode and parametrize. Based on these observations, additional analyses, and a review of recent literature, we propose recommendations for developers of stability prediction methods and for efforts aimed at improving the datasets used for training. We leave our interactive interface for analysis available online athttp://lucianoabriata.altervista.org/papersdata/proteinstability2021/s1626navigation.htmlso that users can further explore the dataset and baseline predictions, possibly serving as a tool useful in the context of structural biology and protein biotechnology research and as material for education in protein biophysics. |
Databáze: | OpenAIRE |
Externí odkaz: |