A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features

Autor: Changli Feng, Zhaogui Ma, Deyun Yang, Xin Li, Jun Zhang, Yanjuan Li
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: Frontiers in Bioengineering and Biotechnology, Vol 8 (2020)
Druh dokumentu: article
ISSN: 2296-4185
DOI: 10.3389/fbioe.2020.00285
Popis: The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to achieve this recognition task. In this method, an amino acid reduction scheme was adopted to recode the amino acid sequence. Then, the physicochemical characteristics, auto-cross covariance (ACC), and reduced dipeptides were calculated and integrated to form a mixed feature set, which was processed using correlation analysis, feature selection, and principal component analysis (PCA) to remove redundant information. Finally, four machine learning methods and a dataset containing 500 random observations out of 915 thermophilic proteins and 500 random samples out of 793 non-thermophilic proteins were used to train and predict the data. The experimental results showed that 98.2% of thermophilic and non-thermophilic proteins were correctly identified using 10-fold cross-validation. Moreover, our analysis of the final reserved features and removed features yielded information about the crucial, unimportant and insensitive elements, it also provided essential information for enzyme design.
Databáze: Directory of Open Access Journals