PHROG: families of prokaryotic virus proteins clustered using remote homology
Autor: | Rubén Enrique Pérez Bucio, Julien Lossouarn, Eric Olo Ndela, Clovis Galiez, François Enault, Robin Mom, Marie-Agnès Petit, Paul Terzian, Ariane Toussaint |
---|---|
Přispěvatelé: | Laboratoire Microorganismes : Génome et Environnement (LMGE), Centre National de la Recherche Scientifique (CNRS)-Université Clermont Auvergne (UCA), Centre National de la Recherche Scientifique (CNRS), MICrobiologie de l'ALImentation au Service de la Santé (MICALIS), AgroParisTech-Université Paris-Saclay-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Laboratoire de Physique et Physiologie Intégratives de l’Arbre en environnement Fluctuant (PIAF), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Université Clermont Auvergne (UCA), Université libre de Bruxelles (ULB), H2020 European Research Council685778INRAE, Statistique pour le Vivant et l’Homme (SVH), Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Institut de Biologie et de Médecine Moléculaires [Gosselies] (ULB/IBMM), Faculté des Sciences [Bruxelles] (ULB), Université libre de Bruxelles (ULB)-Université libre de Bruxelles (ULB)-Faculté de Médecine [Bruxelles] (ULB) |
Rok vydání: | 2021 |
Předmět: |
AcademicSubjects/SCI01140
AcademicSubjects/SCI01060 Protein family Viral protein Ecology (disciplines) AcademicSubjects/SCI00030 Standard Article Computational biology Biology AcademicSubjects/SCI01180 medicine.disease_cause 03 medical and health sciences Annotation medicine [SDV.BV]Life Sciences [q-bio]/Vegetal Biology Homology (anthropology) Cluster analysis 030304 developmental biology Sequence (medicine) 0303 health sciences prokaryotic virus proteins [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] MOBILE GENETIC ELEMENTS 030302 biochemistry & molecular biology homology Classification [SDV.MP.VIR]Life Sciences [q-bio]/Microbiology and Parasitology/Virology AcademicSubjects/SCI00980 [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] PHROG Reference genome |
Zdroj: | NAR Genomics and Bioinformatics NAR Genomics and Bioinformatics, 2022, 3 (3), pp.12 P;. ⟨10.1093/nargab/lqab067⟩ NAR Genomics and Bioinformatics, 2021, 3 (3), ⟨10.1093/nargab/lqab067⟩ |
ISSN: | 2631-9268 |
Popis: | Viruses are abundant, diverse and ancestral biological entities. Their diversity is high, both in terms of the number of different protein families encountered and in the sequence heterogeneity of each protein family. The recent increase in sequenced viral genomes constitutes a great opportunity to gain new insights into this diversity and consequently urges the development of annotation resources to help functional and comparative analysis. Here, we introduce PHROG (Prokaryotic Virus Remote Homologous Groups), a library of viral protein families generated using a new clustering approach based on remote homology detection by HMM profile-profile comparisons. Considering 17 473 reference (pro)viruses of prokaryotes, 868 340 of the total 938 864 proteins were grouped into 38 880 clusters that proved to be a 2-fold deeper clustering than using a classical strategy based on BLAST-like similarity searches, and yet to remain homogeneous. Manual inspection of similarities to various reference sequence databases led to the annotation of 5108 clusters (containing 50.6 % of the total protein dataset) with 705 different annotation terms, included in 9 functional categories, specifically designed for viruses. Hopefully, PHROG will be a useful tool to better annotate future prokaryotic viral sequences thus helping the scientific community to better understand the evolution and ecology of these entities. |
Databáze: | OpenAIRE |
Externí odkaz: |