Cloud MapReduce for Monte Carlo bootstrap applied to Metabolic Flux Analysis
Autor: | Bernd Freisleben, Katharina Nöh, Tim Dörnemann, Ernst Juhnke, Michael Weitzel, Tolga Dalman, Wolfgang Wiechert |
---|---|
Rok vydání: | 2013 |
Předmět: |
Speedup
Computer Networks and Communications Computer science business.industry computer.internet_protocol Interface (Java) Monte Carlo method Cloud computing Context (language use) Parallel computing Data type Business Process Execution Language Workflow Hardware and Architecture Scalability business computer Software |
Zdroj: | Future Generation Computer Systems. 29:582-590 |
ISSN: | 0167-739X |
DOI: | 10.1016/j.future.2011.10.007 |
Popis: | The MapReduce architectural pattern popularized by Google has successfully been utilized in several scientific applications. Up until now, MapReduce is rarely employed in the field of Systems Biology. We investigate whether a MapReduce approach utilizing on-demand resources from a Cloud is suitable to perform simulation tasks in the area of Metabolic Flux Analysis (MFA). An Amazon ElasticMapReduce Cloud implementation of the parallel, parametric Monte Carlo bootstrap in the context to 13C-MFA is presented. The seamless integration of the application into a service-oriented, BPEL-based scientific workflow framework is shown. A comparison of a straightforward MapReduce implementation using the Hadoop streaming interface on various Amazon ElasticMapReduce instance types and a single CPU core computation approach reveals a speedup of 17 on 64 Amazon cores. I/O operations on many small files within the Reduce step were identified as the limiting step. By exploiting the Hadoop Java API, making use of built-in data types and tuning problem-specific Hadoop parameters, the I/O issues could be resolved. With the revised implementation, a speedup of up to 48 could be achieved on 64 Amazon cores. To investigate the runtimes of a realistic 13C-MFA analysis, 50,000 Monte Carlo samples with a typical metabolic network model have been performed on 20 virtual nodes in 24?h and 23 min with a total cost of $384. Our work demonstrates the possibility to perform scalable Systems Biology applications using Amazon's Cloud MapReduce service. Highlights? Scientific workflows are an attractive option in Metabolic Flux Analysis. ? A service-oriented approach for on-demand Cloud computing with Amazon is presented. ? Compute-intensive, parallel Monte Carlo simulations can be realized using MapReduce. ? A speedup of factor 48 is achieved on 64 cores compared to a single core computation. ? A large-scale simulation is performed on 152 cores in 24?h and $384 total costs. |
Databáze: | OpenAIRE |
Externí odkaz: |