Enabling Collaborative Data Science Development with the Ballet Framework
Autor: | Kalyan Veeramachaneni, Micah J. Smith, Kelvin Lu, Jürgen Cito |
---|---|
Rok vydání: | 2021 |
Předmět: |
FOS: Computer and information sciences
Feature engineering Computer Science - Machine Learning Computer Networks and Communications Computer science Computer Science - Human-Computer Interaction Cloud computing 02 engineering and technology Machine Learning (cs.LG) Human-Computer Interaction (cs.HC) Computer Science - Software Engineering Software 020204 information systems 0202 electrical engineering electronic engineering information engineering Software system business.industry Software development 020207 software engineering computer.file_format Data science Software Engineering (cs.SE) Human-Computer Interaction Conceptual framework Programming paradigm Executable business computer Social Sciences (miscellaneous) |
Zdroj: | Proceedings of the ACM on Human-Computer Interaction. 5:1-39 |
ISSN: | 2573-0142 |
Popis: | While the open-source software development model has led to successful large-scale collaborations in building software systems, data science projects are frequently developed by individuals or small teams. We describe challenges to scaling data science collaborations and present a conceptual framework and ML programming model to address them. We instantiate these ideas in Ballet, the first lightweight framework for collaborative, open-source data science through a focus on feature engineering, and an accompanying cloud-based development environment. Using our framework, collaborators incrementally propose feature definitions to a repository which are each subjected to software and ML performance validation and can be automatically merged into an executable feature engineering pipeline. We leverage Ballet to conduct a case study analysis of an income prediction problem with 27 collaborators, and discuss implications for future designers of collaborative projects. |
Databáze: | OpenAIRE |
Externí odkaz: |