Scalable Computing for Evolutionary Genomics

Autor:	Steffen Möller, Dominique Belhachemi, Geert Smant, Pjotr Prins
Jazyk:	angličtina
Rok vydání:	2012
Předmět:	Computer science Bioinformatics Legacy system Cloud computing Evolutionary biology computer.software_genre Big data Software Computer cluster Software system Laboratorium voor Nematologie business.industry EPS-2 Software as a service Software development Parallelization VirtualBox Virtualization Virtual machine PAML MrBayes OpenStack Debian Linux Software construction Operating system Amazon EC2 Software design MPI BioNode Laboratory of Nematology business computer Cluster computing
Zdroj:	Methods in Molecular Biology ISBN: 9781617795848 Evolutionary Genomics. Statistical and Computational Methods, Volume 2 Evolutionary Genomics. Statistical and Computational Methods, Volume 2. Humana Press
Popis:	Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6743e06174d8abac544732c8f32635b9 https://doi.org/10.1007/978-1-61779-585-5_22 Zobrazit plný text záznamu