Classification of G-protein coupled receptors by alignment-independent extraction of principal chemical properties of primary amino acid sequences

Autor: Peteris Prusis, Claes Post, Maris Lapinsh, Jarl E. S. Wikberg, Alexandrs Gutcaits, Torbjörn Lundstedt
Rok vydání: 2002
Předmět:
Zdroj: Protein Science. 11:795-805
ISSN: 1469-896X
0961-8368
DOI: 10.1110/ps.2500102
Popis: We have developed an alignment-independent method for classification of G-protein coupled receptors (GPCRs) according to the principal chemical properties of their amino acid sequences. The method relies on a multivariate approach where the primary amino acid sequences are translated into vectors based on the principal physicochemical properties of the amino acids and transformation of the data into a uniform matrix by applying a modified autocross-covariance transform. The application of principal component analysis to a data set of 929 class A GPCRs showed a clear separation of the major classes of GPCRs. The application of partial least squares projection to latent structures created a highly valid model (cross-validated correlation coefficient, Q(2) = 0.895) that gave unambiguous classification of the GPCRs in the training set according to their ligand binding class. The model was further validated by external prediction of 535 novel GPCRs not included in the training set. Of the latter, only 14 sequences, confined in rapidly expanding GPCR classes, were mispredicted. Moreover, 90 orphan GPCRs out of 165 were tentatively identified to GPCR ligand binding class. The alignment-independent method could be used to assess the importance of the principal chemical properties of every single amino acid in the protein sequences for their contributions in explaining GPCR family membership. It was then revealed that all amino acids in the unaligned sequences contributed to the classifications, albeit to varying extent; the most important amino acids being those that could also be determined to be conserved by using traditional alignment-based methods.
Databáze: OpenAIRE