Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization

Autor: Wentao Wu, Kevin Schawinski, Frances Ann Hubis, Ce Zhang, Bojan Karlas, Cedric Renggli
Přispěvatelé: Chen, Lei, Ozcan, Fatma, Quamar, Abdul, Tong, Yongxin
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Zdroj: Proceedings of the VLDB Endowment, 12 (12)
ISSN: 2150-8097
Popis: Developing machine learning (ML) applications is similar to developing traditional software-it is often an iterative process in which developers navigate within a rich space of requirements, design decisions, implementations, empirical quality, and performance. In traditional software development, software engineering is the field of study which provides principled guidelines for this iterative process. However, as of today, the counterpart of "software engineering for ML" is largely missing developers of ML applications are left with powerful tools (e.g., TensorFlow and PyTorch) but little guidance regarding the development lifecycle itself. In this paper, we view the management of ML development lifecycles from a data management perspective. We demonstrate two closely related systems, ease.ml/ci and ease.ml/meter, that provide some "principled guidelines" for ML application development: ci is a continuous integration engine for ML models and meter is a "profiler" for controlling overfitting of ML models. Both systems focus on managing the "statistical generalization power" of datasets used for assessing the quality of ML applications, namely, the validation set and the test set. By demonstrating these two systems we hope to spawn further discussions within our community on building this new type of data management systems for statistical generalization.
Proceedings of the VLDB Endowment, 12 (12)
ISSN:2150-8097
Databáze: OpenAIRE