A novel improved prediction of protein structural class using deep recurrent neural network

Autor:	Bishnupriya Panda, Babita Majhi
Rok vydání:	2018
Předmět:	Protein structural class Computer science business.industry Cognitive Neuroscience Feature vector 020206 networking & telecommunications Pattern recognition 02 engineering and technology High dimensional Mathematics (miscellaneous) Recurrent neural network Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Entropy (information theory) 020201 artificial intelligence & image processing Word2vec Computer Vision and Pattern Recognition Artificial intelligence business Natural language Curse of dimensionality
Zdroj:	Evolutionary Intelligence. 14:253-260
ISSN:	1864-5917 1864-5909
DOI:	10.1007/s12065-018-0171-3
Popis:	For last few decades, sequence arrangement of amino acids have been utilized for the prediction of protein secondary structure. Recent methods have applied high dimensional natural language based features in machine learning models. Performance measures of machine learning based models are significantly affected by data size and data dimensionality. It is a huge challenge to develop a generic model which can be trained to perform both for small and large sized datasets in a low dimensional framework. In the present research, we suggest a low dimensional representation for both small and large sized datasets. A hybrid space of Atchley’s factors II, IV, V, electron ion interaction potential and SkipGram based word2vec have been employed for amino acid sequence representation. Subsequently Stockwell transformation is applied to the representation to preserve features both in time and frequency domains. Finally, deep gated recurrent network with dropout, categorical-cross entropy error estimation and Adam optimization is used for classification purpose. The introduced method results in better prediction accuracies for both small (204,277, and 498) and large sized (PDB25, Protein 640 and FC699) bench mark data sets of low sequence similarity (25–40%). The obtained classification accuracies for PDB25, 640, FC699, 498, 277, 204 datasets are 84.2%, 94.31%, 93.1%, 95.9%, 94.5% and 85.36% respectively. The major contributions in this research is that, for the first time, we verify the protein secondary structural class prediction in a very low dimensional (18-D) feature space with a novel feature representation method. Secondly, we also verify for the first time, the behaviour of deep networks for low dimensional small sized data sets.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::09e3b02cad0849169ddddf60fa761f3a https://doi.org/10.1007/s12065-018-0171-3 Zobrazit plný text záznamu Full text from SpringerLink