A model to predict the function of hypothetical proteins through a nine-point classification scoring schema

Autor:	Vijayaraghava Seshadri Sundararajan, Anuj Kumar, Girik Malik, Partha Sarathi Das, Neeraja Bethi, Prashanth Suravajhala, Narendra Meena, Johny Ijaq
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	Computer science Decision tree Hypothetical proteins Classification features Machine learning computer.software_genre lcsh:Computer applications to medicine. Medical informatics Biochemistry 03 medical and health sciences Naive Bayes classifier 0302 clinical medicine C4.5 algorithm Structural Biology Humans Molecular Biology lcsh:QH301-705.5 Organism 030304 developmental biology 0303 health sciences business.industry Methodology Article Applied Mathematics Proteins Bayes Theorem Functional genomics Perceptron Computer Science Applications Schema (genetic algorithms) lcsh:Biology (General) 030220 oncology & carcinogenesis DECIPHER lcsh:R858-859.7 Artificial intelligence DNA microarray Heuristics business computer
Zdroj:	BMC Bioinformatics, Vol 20, Iss 1, Pp 1-8 (2019) BMC Bioinformatics
ISSN:	1471-2105
Popis:	Background Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher sequence-structure-function relationship, especially in terms of functional modelling of the HPs have been developed by researchers, but using the features as classifiers for HPs has not been attempted. With the rise in number of annotation strategies, next-generation sequencing methods have provided further understanding the functions of HPs. Results In our previous work, we developed a six-point classification scoring schema with annotation pertaining to protein family scores, orthology, protein interaction/association studies, bidirectional best BLAST hits, sorting signals, known databases and visualizers which were used to validate protein interactions. In this study, we introduced three more classifiers to our annotation system, viz. pseudogenes linked to HPs, homology modelling and non-coding RNAs associated to HPs. We discuss the challenges and performance of these classifiers using machine learning heuristics with an improved accuracy from Perceptron (81.08 to 97.67), Naive Bayes (54.05 to 96.67), Decision tree J48 (67.57 to 97.00), and SMO_npolyk (59.46 to 96.67). Conclusion With the introduction of three new classification features, the performance of the nine-point classification scoring schema has an improved accuracy to functionally annotate the HPs. Electronic supplementary material The online version of this article (10.1186/s12859-018-2554-y) contains supplementary material, which is available to authorized users.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8d0cb0d8a48fad13bef2003f85451520 http://link.springer.com/article/10.1186/s12859-018-2554-y Zobrazit plný text záznamu Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.