Genomic signatures for metagenomic data analysis: Exploiting the reverse complementarity of tetranucleotides

Autor: Gori, F., Mavroeidis, D., Jetten, M.S.M., Marchiori, E., Chen, L.
Přispěvatelé: Chen, L.
Rok vydání: 2011
Předmět:
Zdroj: Chen, L. (ed.), 2011 IEEE International Conference on Systems Biology (ISB) Zhuhai, China, September 2–4, 2011, 149-154. Red Hook : IEEE
STARTPAGE=149;ENDPAGE=154;TITLE=Chen, L. (ed.), 2011 IEEE International Conference on Systems Biology (ISB) Zhuhai, China, September 2–4, 2011
Chen, L. (ed.), 2011 IEEE International Conference on Systems Biology (ISB) Zhuhai, China, September 2–4, 2011, pp. 149-154
Popis: Metagenomics studies microbial communities by analyzing their genomic content directly sequenced from the environment. To this aim metagenomic datasets, consisting of many short DNA or RNA fragments, are computationally analyzed using statistical and machine learning methods with the general purpose of binning or taxonomic annotation. Many of these methods act on features derived from the data through a genomic signature, where a typical genomic signature of a fragment is a vector whose entries specify the frequency with which oligonucleotides appear in that fragment. In this article we analyze experimentally the ability of existing genomic signatures to facilitate the discrimination between fragments belonging to different genomes. We also propose new genomic signatures that take into account that fragments can have been sequenced from both strands of a genome; this is achieved by exploiting the reverse complementarity of oligonucleotides. We conduct extensive experiments on in silico sampled genomic fragments in order to assess comparatively the effectiveness of existing genomic signatures and those proposed in this article. Results of the experiments indicate that the direct use of the reverse complementarity of tetranucleotides in the definition of a genome signatures allows to have performances comparable to the best existing signatures using less features. Therefore the proposed genomic signatures provide an alternative set of features for analyzing metagenomic data. Online Supplementary material is available at http://www.cs.ru.nl/∼gori/signature metagenomics/.
Databáze: OpenAIRE