Principle:DistrictDataLabs Yellowbrick Statistical Quartet Visualization
| Knowledge Sources | |
|---|---|
| Domains | Data_Science, Visualization |
| Last Updated | 2026-02-08 05:00 GMT |
Overview
Principle of using datasets with identical summary statistics but different distributions to demonstrate why visualization is essential for data analysis.
Description
Anscombe's Quartet (1973) and the Datasaurus Dozen (Matejka & Fitzmaurice, 2017) are canonical examples showing that datasets can share nearly identical means, variances, correlations, and regression lines while having fundamentally different scatter plot patterns. This principle underpins the entire field of visual analytics: summary statistics alone are insufficient for understanding data.
Usage
Use this principle in teaching and communication contexts to justify the importance of exploratory data visualization before modeling. It provides the foundational motivation for tools like Yellowbrick.
Theoretical Basis
Given datasets where for all pairs :
Yet the scatter plots of each dataset reveal distinct structures (linear, quadratic, outlier-driven, clustered, etc.).