Popis: |
Machine learning (ML) is extensively used in production-ready applications, calling for mature engineering techniques to ensure robustdevelopment, deployment and maintenance. Given the potential negative impact ML can have on people, society or the environment,engineering techniques that can ensure robustness against technical errors and adversarial attacks are of considerable importance. Inthis work, we investigate how teams of experts develop, deploy and maintain software with ML components. Towards this goal, wemined both academic and grey literature, and compiled a catalogue of engineering practices for ML. We validated this catalogue usinga large-scale survey, which measured the degree of adoption of the practices and their perceived effects. Moreover, we ran validationinterviews with practitioners to add depth to the survey results. The catalogue covers a broad range of practices for engineeringsoftware systems with ML components and for ensuring non-functional properties that fall under the umbrella of trustworthy ML,such as robustness or fairness. Here, we present the results of our study, which indicate, for example, that larger and more experienced teams tend to adopt more practices, but that trustworthiness practices tend to be neglected. Moreover, we show that the effectsmeasured in our survey, such as team agility or accountability, can be predicted quite accurately from groups of practices. This allowedus to contrast the importance of the practices for these effects as well as adoption rates, revealing, for example, that widely adoptedpractices are, in practice, less important with respect to some effects. For instance, writing reusable scripts for data cleaning andmerging is highly adopted, but has a limited impact on reproducibility. Based on the results from our survey, we were also able todevelop a maturity model that measures the engineering ability of teams developing software with ML components and benchmark itagainst other teams in our data set. Overall, our study provides a quantitative assessment of ML engineering practices and their impacton desirable properties of software with ML components; it also opens multiple avenues for improving the adoption of useful practices. |