Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark
Autor: | Cameron M. Avelis, Elijah Roberts, Christopher H. Bohrer, Max Klein, Rati Sharma |
---|---|
Rok vydání: | 2016 |
Předmět: |
0301 basic medicine
Statistics and Probability Source code media_common.quotation_subject 02 engineering and technology computer.software_genre Biochemistry 03 medical and health sciences Spark (mathematics) 0202 electrical engineering electronic engineering information engineering Computer Simulation Molecular Biology media_common Supplementary data Microscopy Database Computational Biology 020207 software engineering Applications Notes Computer Science Applications Computational Mathematics Open source license 030104 developmental biology Open source Computational Theory and Mathematics Informatics Scalability computer Software |
Zdroj: | Bioinformatics. 33:303-305 |
ISSN: | 1367-4811 1367-4803 |
Popis: | Summary Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology. Availability and Implementation Source code is licensed under the Apache 2.0 open source license and is available at the project website: https://www.assembla.com/spaces/roberts-lab-public/wiki/Biospark Supplementary information Supplementary data are available at Bioinformatics online. |
Databáze: | OpenAIRE |
Externí odkaz: |