Effect of Feature Selection on Kinase Classification Models

Autor:	Aruna Malapati, Priyanka Purkayastha, Akhila Rallapalli, Perumal Yogeeswari, N. L. Bhanu Murthy, Dharmarajan Sriram
Rok vydání:	2014
Předmět:	Biological data Computer science business.industry Pattern recognition Feature selection Random forest ComputingMethodologies_PATTERNRECOGNITION Dimension (vector space) Feature (computer vision) Artificial intelligence business Pseudo amino acid composition Area under the roc curve Interpretability
Zdroj:	Computational Intelligence in Medical Informatics ISBN: 9789812872593
Popis:	Classification of kinases will provide comparison of related human kinases and insights into kinases functions and evolution. Several algorithms exist for classification and most of them failed to classify when the dimension of feature set large. Selecting the relevant features for classification is significant for variety of reasons like simplification of performance, computational efficiency, and feature interpretability. Generally, feature selection techniques are employed in such cases. However, there has been a limited study on feature selection techniques for classification of biological data. This work tries to determine the impact of feature selection algorithms on classification of kinases. We have used forward greedy feature selection algorithm along with random forest classification algorithm. The performance was evaluated by selecting the feature subset which maximizes Area Under the ROC Curve (AUC). The method identifies the feature subset from the datasets which contains the physiochemical properties of kinases like amino acid, dipeptide, and pseudo amino acid composition. An improvised performance of classification is noted for feature subset than with all the features. Thus, our method indicates that groups of kinases are classifiable with maximum AUC, if good subsets of features are used.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::03a1c454c438c19afe95f96e8c885ae9 https://doi.org/10.1007/978-981-287-260-9_8 Zobrazit plný text záznamu