Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark

Autor:	Cameron M. Avelis, Elijah Roberts, Christopher H. Bohrer, Max Klein, Rati Sharma
Rok vydání:	2016
Předmět:	0301 basic medicine Statistics and Probability Source code media_common.quotation_subject 02 engineering and technology computer.software_genre Biochemistry 03 medical and health sciences Spark (mathematics) 0202 electrical engineering electronic engineering information engineering Computer Simulation Molecular Biology media_common Supplementary data Microscopy Database Computational Biology 020207 software engineering Applications Notes Computer Science Applications Computational Mathematics Open source license 030104 developmental biology Open source Computational Theory and Mathematics Informatics Scalability computer Software
Zdroj:	Bioinformatics. 33:303-305
ISSN:	1367-4811 1367-4803
Popis:	Summary Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology. Availability and Implementation Source code is licensed under the Apache 2.0 open source license and is available at the project website: https://www.assembla.com/spaces/roberts-lab-public/wiki/Biospark Supplementary information Supplementary data are available at Bioinformatics online.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::010c6b881cdc6fa8afe8b823c0d3ebc8 https://doi.org/10.1093/bioinformatics/btw614 Zobrazit plný text záznamu