ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

Autor:	Arno Meiler, Claudia Klinger, Michael Kaufmann
Rok vydání:	2012
Předmět:	Sequence analysis Archaeal Proteins Context (language use) Computational biology Biology lcsh:Computer applications to medicine. Medical informatics Biochemistry Cog Structural Biology Amino Acids Codon Databases Protein lcsh:QH301-705.5 Molecular Biology Phylogeny Sequence (medicine) Genetics chemistry.chemical_classification Base Composition Nucleotides Applied Mathematics Temperature Nucleic acid sequence Proteins Archaea Computer Science Applications Amino acid lcsh:Biology (General) chemistry Phylogenetic Pattern Codon usage bias lcsh:R858-859.7 Sequence Analysis Software
Zdroj:	BMC Bioinformatics, Vol 13, Iss 1, p 223 (2012) BMC Bioinformatics
ISSN:	1471-2105
Popis:	Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6746b4ed31cb9edb859a6db1f4d6e4c3 https://doi.org/10.1186/1471-2105-13-223 Zobrazit plný text záznamu Full text from SpringerLink