Will Epperson
/ PhD Student at CMU

Guided Statistical Workflows with Interactive Explanations and Assumption Checking

Yuqi Zhang, Adam Perer, Will Epperson

GuidedStats is a Jupyter extension that helps data scientists perform statistical analyses with guided workflows.

Abstract

Statistical practices such as building regression models or running hypothesis tests rely on following rigorous procedures of steps and verifying assumptions on data to produce valid results. However, common statistical tools do not verify users’ decision choices and provide low-level statistical functions without instructions on the whole analysis practice. Users can easily misuse analysis methods, potentially decreasing the validity of results. To address this problem, we introduce GuidedStats, an interactive interface within computational notebooks that encapsulates guidance, models, visualization, and exportable results into interactive workflows. It breaks down typical analysis processes, such as linear regression and two-sample T-tests, into interactive steps supplemented with automatic visualizations and explanations for step-wise evaluation. Users can iterate on input choices to refine their models, while recommended actions and exports allow the user to continue their analysis in code. Case studies show how GuidedStats offers valuable instructions for conducting fluid statistical analyses while finding possible assumption violations in the underlying data, supporting flexible and accurate statistical analyses.

Citation

Guided Statistical Workflows with Interactive Explanations and Assumption Checking
Yuqi Zhang, Adam Perer, Will Epperson
GuidedStats is a Jupyter extension that helps data scientists perform statistical analyses with guided workflows.
VIS 24: IEEE Conference on Data Visualization (VIS). St Pete Beach, Florida, 2024.
Project PDF Code