A tree kernel to analyse phylogenetic profiles

Autor: Jean-Philippe Vert
Přispěvatelé: Bioinformatics Center (KEGG), Kyoto University [Kyoto], Vert, Jean-Philippe
Rok vydání: 2002
Předmět:
Graph kernel
genetic structures
02 engineering and technology
Biochemistry
Kernel principal component analysis
Pattern Recognition
Automated

MESH: Saccharomyces cerevisiae Proteins
String kernel
0202 electrical engineering
electronic engineering
information engineering

MESH: Pattern Recognition
Automated

MESH: Models
Genetic

MESH: Phylogeny
Phylogeny
MESH: Evolution
Molecular

[INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM]
Mathematics
0303 health sciences
[SDV.BIBS] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
Phylogenetic tree
food and beverages
MESH: Gene Expression Regulation
MESH: Saccharomyces cerevisiae
[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
Computer Science Applications
Computational Mathematics
Kernel method
Computational Theory and Mathematics
Kernel (statistics)
020201 artificial intelligence & image processing
Tree kernel
Algorithms
Statistics and Probability
Saccharomyces cerevisiae Proteins
information science
MESH: Algorithms
Saccharomyces cerevisiae
Evolution
Molecular

MESH: Gene Expression Profiling
03 medical and health sciences
Artificial Intelligence
MESH: Artificial Intelligence
Molecular Biology
030304 developmental biology
Models
Statistical

Models
Genetic

business.industry
Gene Expression Profiling
Pattern recognition
Support vector machine
ComputingMethodologies_PATTERNRECOGNITION
Gene Expression Regulation
Artificial intelligence
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
business
MESH: Models
Statistical
Zdroj: ISMB
Bioinformatics
Bioinformatics, Oxford University Press (OUP), 2002, 18 Suppl 1, pp.S276-84
Scopus-Elsevier
ISSN: 1367-4811
1367-4803
DOI: 10.1093/bioinformatics/18.suppl_1.s276
Popis: Motivation: The phylogenetic profile of a protein is a string that encodes the presence or absence of the protein in every fully sequenced genome. Because proteins that participate in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion, the phylogenetic profiles of such proteins are often ‘similar’ or at least ‘related’ to each other. The question we address in this paper is the following: how to measure the ‘similarity’ between two profiles, in an evolutionarily relevant way, in order to develop efficient function prediction methods? Results: We show how the profiles can be mapped to a high-dimensional vector space which incorporates evolutionarily relevant information, and we provide an algorithm to compute efficiently the inner product in that space, which we call the tree kernel. The tree kernel can be used by any kernel-based analysis method for classification or data mining of phylogenetic profiles. As an application a Support Vector Machine (SVM) trained to predict the functional class of a gene from its phylogenetic profile is shown to perform better with the tree kernel than with a naive kernel that does not include any information about the phylogenetic relationships among species. Moreover a kernel principal component analysis (KPCA) of the phylogenetic profiles illustrates the sensitivity of the tree kernel to evolutionarily relevant variations. Availability: All data and software used are freely and publicly available upon request. Contact: Jean-Philippe.Vert@mines.org Keywords: phylogenetic profile; tree; kernel; support vector machine; gene function prediction.
Databáze: OpenAIRE