Large Scale Explorative Oligonucleotide Probe Selection for Thousands of Genetic Groups on a Computing Grid: Application to Phylogenetic Probe Design Using a Curated Small Subunit Ribosomal RNA Gene Database

Autor: Mohieddine Missaoui, Antoine Mahul, Eric Peyretaillade, Pierre Peyret, David R.C. Hill, Nicolas Parisot, Jérémie Denonfoux, Sébastien Cipière, Faouzi Jaziri
Přispěvatelé: Laboratoire d'Informatique, de Modélisation et d'optimisation des Systèmes (LIMOS), Université Blaise Pascal - Clermont-Ferrand 2 (UBP)-Université d'Auvergne - Clermont-Ferrand I (UdA)-SIGMA Clermont (SIGMA Clermont)-Ecole Nationale Supérieure des Mines de St Etienne (ENSM ST-ETIENNE)-Centre National de la Recherche Scientifique (CNRS), Conception, Ingénierie et Développement de l'Aliment et du Médicament (CIDAM), Université d'Auvergne - Clermont-Ferrand I (UdA), UFR Pharmacie, Laboratoire Microorganismes : Génome et Environnement (LMGE), Université Blaise Pascal - Clermont-Ferrand 2 (UBP)-Université d'Auvergne - Clermont-Ferrand I (UdA)-Centre National de la Recherche Scientifique (CNRS), Centre Régional de Ressources Informatiques (CRRI), Clermont Université, SIGMA Clermont (SIGMA Clermont)-Université d'Auvergne - Clermont-Ferrand I (UdA)-Ecole Nationale Supérieure des Mines de St Etienne-Centre National de la Recherche Scientifique (CNRS)-Université Blaise Pascal - Clermont-Ferrand 2 (UBP)
Jazyk: angličtina
Rok vydání: 2014
Předmět:
Science (General)
Article Subject
lcsh:Medicine
Biology
computer.software_genre
[SDV.BID.SPT]Life Sciences [q-bio]/Biodiversity/Systematics
Phylogenetics and taxonomy

lcsh:Technology
General Biochemistry
Genetics and Molecular Biology

03 medical and health sciences
Q1-390
Software
Phylogenetics
lcsh:Science
Selection (genetic algorithm)
Phylogeny
030304 developmental biology
General Environmental Science
Oligonucleotide Array Sequence Analysis
0303 health sciences
[SDV.GEN]Life Sciences [q-bio]/Genetics
Phylogenetic tree
Database
030306 microbiology
business.industry
lcsh:T
lcsh:R
Computational Biology
Genes
rRNA

General Medicine
Supercomputer
Grid
[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
lcsh:Q
[INFO.INFO-DC]Computer Science [cs]/Distributed
Parallel
and Cluster Computing [cs.DC]

Scale (map)
business
Oligomer restriction
Databases
Nucleic Acid

Oligonucleotide Probes
computer
Algorithms
Research Article
Zdroj: The Scientific World Journal, Vol 2014 (2014)
The Scientific World Journal
The Scientific World Journal, 2014, 2014, pp.350487. ⟨10.1155/2014/350487⟩
OpenAIRE
DOAJ-Articles
Europe PubMed Central
The Scientific World Journal, Hindawi Publishing Corporation, 2014, 2014, pp.350487. ⟨10.1155/2014/350487⟩
ISSN: 2356-6140
1537-744X
DOI: 10.1155/2014/350487
Popis: International audience; Phylogenetic Oligonucleotide Arrays (POAs) were recently adapted for studying the huge microbial communities in a flexible and easy-to-use way. POA coupled with the use of explorative probes to detect the unknown part is now one of the most powerful approaches for a better understanding of microbial community functioning. However, the selection of probes remains a very difficult task. The rapid growth of environmental databases has led to an exponential increase of data to be managed for an efficient design. Consequently, the use of high performance computing facilities is mandatory. In this paper, we present an efficient parallelization method to select known and explorative oligonucleotide probes at large scale using computing grids. We implemented a software that generates and monitors thousands of jobs over the European Computing Grid Infrastructure (EGI). We also developed a new algorithm for the construction of a high-quality curated phylogenetic database to avoid erroneous design due to bad sequence affiliation. We present here the performance and statistics of our method on real biological datasets based on a phylogenetic prokaryotic database at the genus level and a complete design of about 20,000 probes for 2,069 genera of prokaryotes.
Databáze: OpenAIRE