Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization
Autor: | Wentao Wu, Kevin Schawinski, Frances Ann Hubis, Ce Zhang, Bojan Karlas, Cedric Renggli |
---|---|
Přispěvatelé: | Chen, Lei, Ozcan, Fatma, Quamar, Abdul, Tong, Yongxin |
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Iterative and incremental development
business.industry Computer science Generalization Data management General Engineering Software development 02 engineering and technology Overfitting 01 natural sciences Software 020204 information systems Test set 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Software engineering business 010303 astronomy & astrophysics Implementation |
Zdroj: | Proceedings of the VLDB Endowment, 12 (12) |
ISSN: | 2150-8097 |
Popis: | Developing machine learning (ML) applications is similar to developing traditional software-it is often an iterative process in which developers navigate within a rich space of requirements, design decisions, implementations, empirical quality, and performance. In traditional software development, software engineering is the field of study which provides principled guidelines for this iterative process. However, as of today, the counterpart of "software engineering for ML" is largely missing developers of ML applications are left with powerful tools (e.g., TensorFlow and PyTorch) but little guidance regarding the development lifecycle itself. In this paper, we view the management of ML development lifecycles from a data management perspective. We demonstrate two closely related systems, ease.ml/ci and ease.ml/meter, that provide some "principled guidelines" for ML application development: ci is a continuous integration engine for ML models and meter is a "profiler" for controlling overfitting of ML models. Both systems focus on managing the "statistical generalization power" of datasets used for assessing the quality of ML applications, namely, the validation set and the test set. By demonstrating these two systems we hope to spawn further discussions within our community on building this new type of data management systems for statistical generalization. Proceedings of the VLDB Endowment, 12 (12) ISSN:2150-8097 |
Databáze: | OpenAIRE |
Externí odkaz: |