Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark

Autor: Cameron M. Avelis, Elijah Roberts, Christopher H. Bohrer, Max Klein, Rati Sharma
Rok vydání: 2016
Předmět:
Zdroj: Bioinformatics. 33:303-305
ISSN: 1367-4811
1367-4803
Popis: Summary Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology. Availability and Implementation Source code is licensed under the Apache 2.0 open source license and is available at the project website: https://www.assembla.com/spaces/roberts-lab-public/wiki/Biospark Supplementary information Supplementary data are available at Bioinformatics online.
Databáze: OpenAIRE