Popis: |
Enhancing proteins' thermostability is an important aspect of enzyme engineering. Many studies have investigated the properties that determine the proteins' thermostability. However, no consensus has emerged. To understand the mechanisms underlying the high thermostability of thermophilic proteins, we evaluated the relative importance of the amino acid frequencies in protein sequences for discriminating thermophilic and non-thermophilic proteins based on machine learning algorithms together with a three-step feature selection procedure and a principal component (PC) analysis to remove noisy and redundant information. Our results showed that the frequencies of oppositely charged amino acids, i.e., Lys and Glu, were higher in thermophilic proteins, suggesting that electrostatic interactions are fundamentally important for protein stabilization at high temperatures. Further, we found that the frequencies of uncharged polar amino acids, which are thermolabile or actively interact with water molecules, were lower in thermophilic proteins. Moreover, the frequencies of β-branched aliphatic amino acids tended to increase with increasing thermostability. Overall, these results suggest that proteins' thermostability is determined by a few protein features, which were well captured by the first two PCs. A classifier based on only the first two PCs achieved a high accuracy of 90%, suggesting that our classifier could be an effective and efficient tool for engineering stable proteins. |