Implementation:DistrictDataLabs Yellowbrick Dataset Download
| Knowledge Sources | |
|---|---|
| Domains | Datasets, Utilities |
| Last Updated | 2026-02-08 05:00 GMT |
Overview
Utility function for downloading, verifying, and extracting Yellowbrick example datasets from a hosted store.
Description
The download_data function downloads a zipped dataset from a URL, verifies its SHA256 signature against the expected hash, and extracts the archive to the data home directory. It supports incremental downloading and skips re-download if the dataset already exists (unless replace=True).
Usage
Import this function when building custom dataset loading pipelines or when you need programmatic control over dataset downloads. Most users will use download_all() from yellowbrick.download instead.
Code Reference
Source Location
- Repository: DistrictDataLabs_Yellowbrick
- File: yellowbrick/datasets/download.py
- Lines: 1-110
Signature
def download_data(url, signature, data_home=None, replace=False, extract=True):
"""Downloads zipped dataset, verifies signature, and extracts archive."""
Import
from yellowbrick.datasets.download import download_data
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| url | str | Yes | URL to download dataset zip from |
| signature | str | Yes | Expected SHA256 hash of the zip |
| data_home | str | No | Local directory for datasets |
| replace | bool | No | Re-download even if exists (default: False) |
| extract | bool | No | Extract the zip (default: True) |
Outputs
| Name | Type | Description |
|---|---|---|
| (side effect) | Files | Downloaded and extracted dataset on disk |
Usage Examples
from yellowbrick.datasets.download import download_data
download_data(
url="https://s3.amazonaws.com/ddl-data-lake/yellowbrick/v1.0/mushroom.zip",
signature="abc123...",
data_home="~/.yellowbrick",
)