PCA and K-Means Decipher Genome.

Autor: Barth, Timothy J., Griebel, Michael, Keyes, David E., Nieminen, Risto M., Roose, Dirk, Schlick, Tamar, Kégl, Balázs, Wunsch, Donald C., Gorban, Alexander N., Zinovyev, Andrei Y.
Zdroj: Principal Manifolds for Data Visualization & Dimension Reduction; 2007, p309-323, 15p
Abstrakt: In this paper, we aim to give a tutorial for undergraduate students studying statistical methods and/or bioinformatics. The students will learn how data visualization can help in genomic sequence analysis. Students start with a fragment of genetic text of a bacterial genome and analyze its structure. By means of principal component analysis they "discover" that the information in the genome is encoded by non-overlapping triplets. Next, they learn how to find gene positions. This exercise on PCA and K-Means clustering enables active study of the basic bioinformatics notions. The Appendix contains program listings that go along with this exersice. [ABSTRACT FROM AUTHOR]
Databáze: Supplemental Index