Automatic classification of protein sequences into structure/function groups via parallel cascade identification: a feasibility study

Autor:	Ian W. Hunter, Jerry E. Solomon, Michael J. Korenberg, Robert David
Rok vydání:	2000
Předmět:	Sequence Nonlinear system identification business.industry Protein Conformation Biomedical Engineering System identification Cell Polarity Proteins Pattern recognition Numerical Analysis Computer-Assisted Function (mathematics) Biology Markov Chains Weighting Set (abstract data type) Nonlinear Dynamics Sequence Analysis Protein Test set Feasibility Studies Artificial intelligence Hidden Markov model business Algorithms
Zdroj:	Annals of biomedical engineering. 28(7)
ISSN:	0090-6964
Popis:	A recent paper introduced the approach of using nonlinear system identification as a means for automatically classifying protein sequences into their structure/function families. The particular technique utilized, known as parallel cascade identification (PCI), could train classifiers on a very limited set of exemplars from the protein families to be distinguished and still achieve impressively good two-way classifications. For the nonlinear system classifiers to have numerical inputs, each amino acid in the protein was mapped into a corresponding hydrophobicity value, and the resulting hydrophobicity profile was used in place of the primary amino acid sequence. While the ensuing classification accuracy was gratifying, the use of (Rose scale) hydrophobicity values had some disadvantages. These included representing multiple amino acids by the same value, weighting some amino acids more heavily than others, and covering a narrow numerical range, resulting in a poor input for system identification. This paper introduces binary and multilevel sequence codes to represent amino acids, for use in protein classification. The new binary and multilevel sequences, which are still able to encode information such as hydrophobicity, polarity, and charge, avoid the above disadvantages and increase classification accuracy. Indeed, over a much larger test set than in the original study, parallel cascade models using numerical profiles constructed with the new codes achieved slightly higher two-way classification rates than did hidden Markov models (HMMs) using the primary amino acid sequences, and combining PCI and HMM approaches increased accuracy. © 2000 Biomedical Engineering Society. PAC00: 8714Ee, 8715Cc, 3620Fz, 8715Aa
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0e34ce51483adfbf5cdcf7fa583bb720 https://pubmed.ncbi.nlm.nih.gov/11016417 Zobrazit plný text záznamu Full text from SpringerLink