The raise of machine learning hyperparameter constraints in Python code

Autor: Ingkarat Rak-amnouykit, Ana Milanova, Guillaume Baudart, Martin Hirzel, Julian Dolby
Přispěvatelé: Rensselaer Polytechnic Institute (RPI), Parallélisme de Kahn Synchrone ( Parkas), Département d'informatique - ENS Paris (DI-ENS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), IBM T. J. Watson Research Centre
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: International Symposium on Software Testing and Analysis
ISSTA 2022-31st ACM SIGSOFT International Symposium on Software Testing and Analysis
ISSTA 2022-31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Jul 2022, Virtual, South Korea. ⟨10.1145/3533767.3534400⟩
DOI: 10.1145/3533767.3534400⟩
Popis: International audience; Machine-learning operators often have correctness constraints that cut across multiple hyperparameters and/or data. Violating these constraints causes the operator to raise runtime exceptions, but those are usually documented only informally or not at all. This paper presents the first interprocedural weakest-precondition analysis for Python to extract hyperparameter constraints. The analysis is mostly static, but to make it tractable for typical Python idioms in machine-learning libraries, it selectively switches to the concrete domain for some cases. This paper demonstrates the analysis by extracting hyperparameter constraints for 181 operators from a total of 8 ML libraries, where it achieved high precision and recall and found real bugs. Our technique advances static analysis for Python and is a step towards safer and more robust machine learning.
Databáze: OpenAIRE