Implementation:DistrictDataLabs Yellowbrick Datasaurus Plot
| Knowledge Sources | |
|---|---|
| Domains | Visualization, Data_Science |
| Last Updated | 2026-02-08 05:00 GMT |
Overview
Concrete tool for rendering the Datasaurus Dozen demonstration plot illustrating why visualization is essential for data analysis, provided by the Yellowbrick library.
Description
The datasaurus module embeds four datasets from the Datasaurus Dozen (Matejka & Fitzmaurice, 2017 CHI Conference). These datasets have nearly identical summary statistics (mean, variance, correlation) yet produce dramatically different scatter plots. The module renders a 2x2 grid with scatter plots and linear best fit lines for each dataset.
Usage
Import this function for teaching or demonstration contexts where you want to illustrate the importance of data visualization beyond summary statistics.
Code Reference
Source Location
- Repository: DistrictDataLabs_Yellowbrick
- File: yellowbrick/datasaurus.py
- Lines: 1-1235
Signature
def datasaurus():
"""
Creates 2x2 grid plot of 4 Datasaurus Dozen datasets.
Each subplot shows scatter points and a linear best fit line.
Returns
-------
axa, axb, axc, axd : tuple of matplotlib.Axes
"""
Import
from yellowbrick.datasaurus import datasaurus
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (none) | — | — | Function takes no arguments |
Outputs
| Name | Type | Description |
|---|---|---|
| axes | tuple of 4 Axes | The four matplotlib Axes objects (axa, axb, axc, axd) |
Usage Examples
from yellowbrick.datasaurus import datasaurus
import matplotlib.pyplot as plt
# Display the Datasaurus Dozen demo
axes = datasaurus()
plt.tight_layout()
plt.show()