Tea: A High-level Language and Runtime System for Automating Statistical Analysis
Autor: | René Just, Maureen Daum, Emery D. Berger, Jared Roesch, Katharina Reinecke, Eunice Jun, Sarah Chasins |
---|---|
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
Computer science Computer Science - Human-Computer Interaction 02 engineering and technology Machine learning computer.software_genre Human-Computer Interaction (cs.HC) Set (abstract data type) Runtime system 0202 electrical engineering electronic engineering information engineering False positive paradox 0501 psychology and cognitive sciences 050107 human factors Constraint satisfaction problem Statistical hypothesis testing Declarative programming Parametric statistics Computer Science - Programming Languages business.industry Suite 05 social sciences 020207 software engineering Computer Science - Mathematical Software Artificial intelligence business computer Mathematical Software (cs.MS) Programming Languages (cs.PL) |
Zdroj: | UIST |
DOI: | 10.48550/arxiv.1904.05387 |
Popis: | Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea, a high-level declarative language and runtime system. In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests, and then executes them to test the hypothesis. We evaluate Tea using a suite of statistical analyses drawn from popular tutorials. We show that Tea generally matches the choices of experts while automatically switching to non-parametric tests when parametric assumptions are not met. We simulate the effect of mistakes made by non-expert users and show that Tea automatically avoids both false negatives and false positives that could be produced by the application of incorrect statistical tests. Comment: 11 pages |
Databáze: | OpenAIRE |
Externí odkaz: |