Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Online ml River Python Runtime Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Online_Learning
Last Updated 2026-02-08 16:00 GMT

Overview

Python 3.11+ environment with NumPy, SciPy, and pandas as core dependencies for online machine learning.

Description

This environment provides the base runtime context for all River library functionality. River is a pure-Python online machine learning library with performance-critical components accelerated via Cython and Rust extensions. The core library requires Python 3.11 or higher and three foundational numerical computing packages: NumPy (>=2.3.4), SciPy (>=1.16), and pandas (>=2.2). All River modules, from classification to clustering to time series forecasting, depend on this base environment.

Usage

This environment is required for all River workflows. Any code that imports `river` or any of its submodules requires this base environment. It is the mandatory prerequisite for running every Implementation in the River wiki, including classification pipelines, anomaly detection, drift-adaptive learning, online clustering, and time series forecasting.

System Requirements

Category Requirement Notes
OS Linux, macOS, or Windows Cross-platform; Linux recommended for production
Python >= 3.11 Specified in `pyproject.toml` `requires-python`
Disk ~50MB For library installation and dataset cache

Dependencies

System Packages

No system-level packages are required for installing from pre-built wheels. For source installation, see Environment:Online_ml_River_Build_Toolchain.

Python Packages

  • `numpy` >= 2.3.4, < 3
  • `scipy` >= 1.16, < 2
  • `pandas` >= 2.2, < 3

Optional Dependencies (Feature-Gated)

These packages enable additional functionality but are not required for core usage:

  • `scikit-learn` >= 1.5.1 — Required for `river.compat` module (sklearn interoperability)
  • `sqlalchemy` >= 2.0 — Required for `river.stream.iter_sql` (database streaming)
  • `vaex` — Required for `river.stream.iter_vaex` (Vaex DataFrame streaming)
  • `polars` >= 1.1.0 — Required for `river.stream.iter_polars` (Polars DataFrame streaming)
  • `gymnasium` >= 0.29.0 — Required for `river.bandit.envs` (RL bandit environments)
  • `graphviz` >= 0.20.1 — Required for tree visualization (`draw()` method)
  • `requests` — Required for `river.stream.TwitterLiveStream`

Credentials

The following environment variables may be set:

  • `RIVER_DATA`: Directory for caching downloaded datasets. Defaults to `~/river_data`. Used by `river.datasets.base.get_data_home()`.

No API keys or tokens are required for core library functionality. Twitter and Twitch streaming features require bearer tokens passed as runtime arguments (not environment variables).

Quick Install

# Install River with core dependencies
pip install river>=0.23.0

# Install with optional dependencies for full functionality
pip install river>=0.23.0 scikit-learn>=1.5.1 sqlalchemy>=2.0 gymnasium>=0.29.0 graphviz>=0.20.1 polars>=1.1.0

Code Evidence

Environment variable for data directory from `river/datasets/base.py:26-33`:

def get_data_home():
    """Return the location where remote datasets are to be stored."""
    data_home = os.environ.get("RIVER_DATA", os.path.join("~", "river_data"))
    data_home = os.path.expanduser(data_home)
    if not os.path.exists(data_home):
        os.makedirs(data_home)
    return data_home

Optional import gating from `river/conftest.py:5-19`:

try:
    import sklearn  # noqa: F401
except ImportError:
    collect_ignore.append("compat/test_sklearn.py")

try:
    import sqlalchemy  # noqa: F401
except ImportError:
    collect_ignore.append("stream/iter_sql.py")
    collect_ignore.append("stream/test_sql.py")

try:
    import vaex  # noqa: F401
except ImportError:
    collect_ignore.append("stream/iter_vaex.py")

Gymnasium conditional registration from `river/bandit/envs/__init__.py:3-8`:

try:
    import gymnasium as gym
    GYM_INSTALLED = True
except ImportError:
    GYM_INSTALLED = False

Common Errors

Error Message Cause Solution
`ModuleNotFoundError: No module named 'sklearn'` scikit-learn not installed `pip install scikit-learn>=1.5.1`; required for `river.compat` module
`ModuleNotFoundError: No module named 'sqlalchemy'` SQLAlchemy not installed `pip install sqlalchemy>=2.0`; required for `river.stream.iter_sql`
`ValueError: You have to install graphviz` graphviz not installed `pip install graphviz>=0.20.1`; required for tree visualization
`ModuleNotFoundError: No module named 'gymnasium'` gymnasium not installed `pip install gymnasium>=0.29.0`; required for bandit environments

Compatibility Notes

  • Python 3.10: The `pyproject.toml` specifies `requires-python >= 3.11`. The ruff linter targets Python 3.10, but the package itself requires 3.11+.
  • Windows: Fully supported for pre-built wheels. Source builds require adjustments (no `-lm` math library link).
  • macOS: Fully supported. Apple Silicon (arm64) wheels available.
  • NumPy 2.x: River requires NumPy >= 2.3.4, which is part of the NumPy 2.x series. Older NumPy 1.x is not supported.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment