Towards Effective Foraging by Data Scientists to Find Past Analysis Choices

Autor: Brad A. Myers, Mary Beth Kery, Bonnie E. John, Amber Horvath, Patrick O'Flaherty
Rok vydání: 2019
Předmět:
Zdroj: CHI
DOI: 10.1145/3290605.3300322
Popis: Data scientists are responsible for the analysis decisions they make, but it is hard for them to track the process by which they achieved a result. Even when data scientists keep logs, it is onerous to make sense of the resulting large number of history records full of overlapping variants of code, output, plots, etc. We developed algorithmic and visualization techniques for notebook code environments to help data scientists forage for information in their history. To test these interventions, we conducted a think-aloud evaluation with 15 data scientists, where participants were asked to find specific information from the history of another person's data science project. The participants succeed on a median of 80% of the tasks they performed. The quantitative results suggest promising aspects of our design, while qualitative results motivated a number of design improvements. The resulting system, called Verdant, is released as an open-source extension for JupyterLab.
Databáze: OpenAIRE