Clustering proteins into families using artificial neural networks
Autor: | Pascual Ferrara, Edgardo A. Ferrán |
---|---|
Rok vydání: | 1992 |
Předmět: |
Statistics and Probability
Self-organizing map Computer science Cytochrome c Group Biochemistry Software Design Feature (machine learning) Animals Humans Topological map Cluster analysis Molecular Biology Artificial neural network business.industry Proteins Pattern recognition Composition (combinatorics) Computer Science Applications Computational Mathematics Computational Theory and Mathematics Artificial intelligence Neural Networks Computer business Sequence Alignment Algorithms Software |
Zdroj: | Europe PubMed Central |
ISSN: | 0266-7061 |
Popis: | An artificial neural network was used to cluster proteins into families. The network, composed of 7 x 7 neurons, was trained with the Kohonen unsupervised learning algorithm using, as inputs, matrix patterns derived from the bipeptide composition of 447 proteins, belonging to 13 different families. As a result of the training, and without any a priori indication of the number or composition of the expected families, the network self-organized the activation of its neurons into topologically ordered maps in which almost all the proteins (96.7%) were correctly clustered into the corresponding families. In a second computational experiment, a similar network was trained with one family of the previous learning set (76 cytochrome c sequences). The new neural map clustered these proteins into 25 different neurons (five in the first experiment), wherein phylogenetically related sequences were positioned close to each other. This result shows that the network can adapt the clustering resolution to the complexity of the learning set, a useful feature when working with an unknown number of clusters. Although the learning stage is time consuming, once the topological map is obtained, the classification of new proteins is very fast. Altogether, our results suggest that this novel approach may be a useful tool to organize the search for homologies in large macromolecular databases. |
Databáze: | OpenAIRE |
Externí odkaz: |