Autor: |
Mukhopadhyay, Snehasis, Tang, Changhong, Huang, Jeffery, Palakal, Mathew |
Zdroj: |
Journal of VLSI Signal Processing for Signal, Image & Video Technology; Nov2003, Vol. 35 Issue 3, p273-285, 13p |
Abstrakt: |
Although large-scale classification studies of genetic sequence data are in progress around the world, very few studies compare different classification approaches, e.g. unsupervised and supervised, in terms of objective criteria such as classification accuracy and computational complexity. In this paper, we study such criteria for both unsupervised and supervised classification of a relatively large sequence data set. The unsupervised approach involves use of different sequence alignment algorithms (e.g., Smith-Waterman, FASTA and BLAST) followed by clustering using the Maximin algorithm. The supervised approach uses a suitable numeric encoding (relative frequencies of tuples of nucleotides followed by principal component analysis) which is fed to a Multi-layer Backpropagation Neural Network. Classification experiments conducted on IBM-SP parallel computers show that FASTA with unsupervised Maximin leads to best trade-off between accuracy and speed among all methods, followed by supervised neural networks as the second best approach. Finally, the different classifiers are applied to the problem of cross-species homology detection. [ABSTRACT FROM AUTHOR] |
Databáze: |
Complementary Index |
Externí odkaz: |
|