Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:DistrictDataLabs Yellowbrick Python Scikit Learn Environment

From Leeroopedia


Knowledge Sources
Domains Data_Science, Visualization, Machine_Learning
Last Updated 2026-02-08 05:00 GMT

Overview

Cross-platform Python 3.5+ environment with matplotlib, scikit-learn, scipy, numpy, and cycler for machine learning visualization.

Description

This environment provides the core runtime for all Yellowbrick visualizers. It is a pure CPU-based Python environment built around the scikit-learn ecosystem. No GPU or specialized hardware is required. The package depends on five core libraries: matplotlib for rendering, scikit-learn for ML model integration, scipy for statistical computations, numpy for numerical arrays, and cycler for color cycling. The CI matrix tests against Python 3.8 and 3.9 on Ubuntu, macOS, and Windows.

Usage

Use this environment for all Yellowbrick workflows. Every visualizer in the library requires this base environment. It is the mandatory prerequisite for running any classifier, regressor, cluster, feature, or model selection visualizer.

System Requirements

Category Requirement Notes
OS Cross-platform (Linux, macOS, Windows) CI tests on ubuntu-latest, macos-latest, windows-latest
Python >= 3.5, < 4 (CI tests 3.8, 3.9) Declared in setup.py `python_requires`
Hardware Standard CPU No GPU required; pure Python + numpy computations
Disk ~50MB Package + bundled datasets

Dependencies

System Packages

No OS-level system packages are required beyond a standard Python installation.

Python Packages

Core (required):

  • `matplotlib` >= 2.0.2, != 3.0.0
  • `scipy` >= 1.0.0
  • `scikit-learn` >= 1.0.0
  • `numpy` >= 1.16.0
  • `cycler` >= 0.10.0

Optional (for extended functionality):

  • `nltk` >= 3.2 (text visualizers: PosTagVisualizer, FreqDistVisualizer)
  • `pandas` >= 1.0.4 (DataFrame support in loaders and visualizers)
  • `umap-learn` >= 0.5 (UMAPVisualizer text embedding)
  • `numba` >= 0.55 (required by umap-learn)
  • `spacy` >= 2.0.18 (alternative NLP backend for PosTagVisualizer)

Testing:

  • `pytest` >= 6.1
  • `pytest-cov` >= 2.10
  • `coverage` >= 5.3

Credentials

No API keys or environment variables are required for core functionality.

Optional:

  • `YELLOWBRICK_DATA`: Override the default dataset storage location. If not set, datasets are stored in the package install directory under `yellowbrick/datasets/fixtures/`.

Quick Install

# Install core package (pulls all required dependencies)
pip install yellowbrick

# Install with optional dependencies for full functionality
pip install yellowbrick nltk pandas umap-learn

# Download NLTK data (required for text visualizers)
python -m nltk.downloader popular

Code Evidence

Python version constraint from `setup.py:167`:

config = {
    ...
    "python_requires": ">=3.4, <4"
}

Core dependencies from `requirements.txt:2-6`:

## Dependencies
matplotlib>=2.0.2,!=3.0.0
scipy>=1.0.0
scikit-learn>=1.0.0
numpy>=1.16.0
cycler>=0.10.0

Matplotlib version check from `yellowbrick/style/rcmod.py:29-31`:

from distutils.version import LooseVersion
mpl_ge_150 = LooseVersion(mpl.__version__) >= "1.5.0"

YELLOWBRICK_DATA environment variable from `yellowbrick/datasets/path.py:39-52`:

def get_data_home(path=None):
    if path is None:
        path = os.environ.get("YELLOWBRICK_DATA", FIXTURES)
    path = os.path.expanduser(path)
    path = os.path.expandvars(path)
    if not os.path.exists(path):
        os.makedirs(path)
    return path

CI matrix from `.github/workflows/ci.yml:24-26`:

matrix:
  python-version: [3.8, 3.9]
  os: [ubuntu-latest, macos-latest, windows-latest]

Common Errors

Error Message Cause Solution
`DatasetsError: could not find dataset at ... - does it need to be downloaded?` Dataset not present in data home Run `python -m yellowbrick.download` or set `YELLOWBRICK_DATA` to correct path
`ImportError: cannot import name 'calinski_harabaz_score'` Older scikit-learn API naming Upgrade to scikit-learn >= 0.23 (function renamed to `calinski_harabasz_score`)
`ImportError: cannot import name 'safe_indexing'` scikit-learn >= 0.24 moved `safe_indexing` Upgrade Yellowbrick to >= 1.4 (includes compatibility fix for sklearn issue #1137)
Matplotlib 3.0.0 excluded Known compatibility issue Use any matplotlib version except 3.0.0 (declared in requirements)

Compatibility Notes

  • matplotlib 3.0.0: Explicitly excluded in requirements (`!=3.0.0`) due to a known breaking change. Any other version >= 2.0.2 is supported.
  • matplotlib < 1.5.0: Uses legacy `axes.color_cycle` rcParam instead of `axes.prop_cycle` with cycler. Both paths handled in `yellowbrick/style/rcmod.py`.
  • matplotlib 1.4.2: Has a bug where points become invisible without edge width. Yellowbrick applies a workaround setting `lines.markeredgewidth = 0.01` for this specific version.
  • scikit-learn API changes: Multiple try/except blocks handle renamed functions across sklearn versions (e.g., `calinski_harabaz_score` vs `calinski_harabasz_score`, `safe_indexing` vs `_safe_indexing`).
  • Windows: Fully supported via CI. UMAP text visualizer may have issues on 32-bit Windows Python 2.7 (legacy warning).
  • Conda: Supported via conda-forge channel. CI tests both pip and conda installation paths.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment