Robust Cross-Platform Workflows: How Technical and Scientific Communities Collaborate to Develop, Test and Share Best Practices for Data Analysis

Autor: Lars Wirzenius, Andrea Bagnacani, Fabian Klötzl, Andreas Tille, Matúš Kalaš, Petter Reinholdtsen, Brad Chapman, Michael R. Crusoe, Stuart W. Prescott, Steffen Möller, Stian Soiland-Reyes, Pjotr Prins
Přispěvatelé: Pichon, Fabien, Struckmann, Stephan, Fuellen, Georg
Jazyk: angličtina
Rok vydání: 2017
Předmět:
0301 basic medicine
Common workflow language
Continuous integration testing
Computer science
Windows Workflow Foundation
automated installation
debian
Computational Mechanics
Workflow engine
lcsh:QA75.5-76.95
Workflow technology
World Wide Web
03 medical and health sciences
Software distribution
Container
Common Workflow Language
lcsh:T58.5-58.64
business.industry
lcsh:Information technology
Software development
cwl
software distribution
container
Data science
Continuous Integration testing
Computer Science Applications
Automated installation
Reference data
Matematikk og naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420 [VDP]
030104 developmental biology
Workflow
Mathematics and natural scienses: 400::Information and communication science: 420 [VDP]
lcsh:Electronic computers. Computer science
business
Workflow management system
Zdroj: Data Science and Engineering, Vol 2, Iss 3, Pp 232-244 (2017)
Data Science and Engineering
Möller, S, Prescott, S W, Wirzenius, L, Reinholdtsen, P, Chapman, B, Prins, P, Soiland-Reyes, S, Klötzl, F, Bagnacani, A, Kalaš, M, Tille, A & Crusoe, M R 2017, ' Robust cross-platform workflows : How technical and scientific communities collaborate to develop, test and share best practices for data analysis ', Data Science and Engineering . https://doi.org/10.1007/s41019-017-0050-4
ISSN: 2364-1541
2364-1185
DOI: 10.1007/s41019-017-0050-4
Popis: Information integration and workflow technologies for data analysis have always been major fields of investigation in bioinformatics. A range of popular workflow suites are available to support analyses in computational biology. Commercial providers tend to offer prepared applications remote to their clients. However, for most academic environments with local expertise, novel data collection techniques or novel data analysis, it is essential to have all the flexibility of open source tools and open source workflow descriptions. Workflows in data-driven science such as computational biology have considerably gained in complexity. New tools or new releases with additional features arrive at an enormous pace, new reference data or concepts for quality control are emerging. A well-abstracted workflow and the exchange of the same across work groups has an enormous impact on the efficiency of research and the further development of the field. High-throughput sequencing adds to the avalanche of data available in the field; efficient computation and, in particular, parallel execution motivate the transition from traditional scripts and Makefiles to workflows. We here review the extant software development and distribution model with a focus on the role of integration testing and discuss the effect of Common Workflow Language (CWL) on distributions of open source scientific software to swiftly and reliably provide the tools demanded for the execution of such formally described workflows. It is contended that, alleviated from technical differences for the execution on local machines, clusters or the cloud, communities also gain the technical means to test workflow-driven interaction across several software packages.
Databáze: OpenAIRE