Environment:Online ml River Python Runtime Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Online_Learning |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Python 3.11+ environment with NumPy, SciPy, and pandas as core dependencies for online machine learning.
Description
This environment provides the base runtime context for all River library functionality. River is a pure-Python online machine learning library with performance-critical components accelerated via Cython and Rust extensions. The core library requires Python 3.11 or higher and three foundational numerical computing packages: NumPy (>=2.3.4), SciPy (>=1.16), and pandas (>=2.2). All River modules, from classification to clustering to time series forecasting, depend on this base environment.
Usage
This environment is required for all River workflows. Any code that imports `river` or any of its submodules requires this base environment. It is the mandatory prerequisite for running every Implementation in the River wiki, including classification pipelines, anomaly detection, drift-adaptive learning, online clustering, and time series forecasting.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, or Windows | Cross-platform; Linux recommended for production |
| Python | >= 3.11 | Specified in `pyproject.toml` `requires-python` |
| Disk | ~50MB | For library installation and dataset cache |
Dependencies
System Packages
No system-level packages are required for installing from pre-built wheels. For source installation, see Environment:Online_ml_River_Build_Toolchain.
Python Packages
- `numpy` >= 2.3.4, < 3
- `scipy` >= 1.16, < 2
- `pandas` >= 2.2, < 3
Optional Dependencies (Feature-Gated)
These packages enable additional functionality but are not required for core usage:
- `scikit-learn` >= 1.5.1 — Required for `river.compat` module (sklearn interoperability)
- `sqlalchemy` >= 2.0 — Required for `river.stream.iter_sql` (database streaming)
- `vaex` — Required for `river.stream.iter_vaex` (Vaex DataFrame streaming)
- `polars` >= 1.1.0 — Required for `river.stream.iter_polars` (Polars DataFrame streaming)
- `gymnasium` >= 0.29.0 — Required for `river.bandit.envs` (RL bandit environments)
- `graphviz` >= 0.20.1 — Required for tree visualization (`draw()` method)
- `requests` — Required for `river.stream.TwitterLiveStream`
Credentials
The following environment variables may be set:
- `RIVER_DATA`: Directory for caching downloaded datasets. Defaults to `~/river_data`. Used by `river.datasets.base.get_data_home()`.
No API keys or tokens are required for core library functionality. Twitter and Twitch streaming features require bearer tokens passed as runtime arguments (not environment variables).
Quick Install
# Install River with core dependencies
pip install river>=0.23.0
# Install with optional dependencies for full functionality
pip install river>=0.23.0 scikit-learn>=1.5.1 sqlalchemy>=2.0 gymnasium>=0.29.0 graphviz>=0.20.1 polars>=1.1.0
Code Evidence
Environment variable for data directory from `river/datasets/base.py:26-33`:
def get_data_home():
"""Return the location where remote datasets are to be stored."""
data_home = os.environ.get("RIVER_DATA", os.path.join("~", "river_data"))
data_home = os.path.expanduser(data_home)
if not os.path.exists(data_home):
os.makedirs(data_home)
return data_home
Optional import gating from `river/conftest.py:5-19`:
try:
import sklearn # noqa: F401
except ImportError:
collect_ignore.append("compat/test_sklearn.py")
try:
import sqlalchemy # noqa: F401
except ImportError:
collect_ignore.append("stream/iter_sql.py")
collect_ignore.append("stream/test_sql.py")
try:
import vaex # noqa: F401
except ImportError:
collect_ignore.append("stream/iter_vaex.py")
Gymnasium conditional registration from `river/bandit/envs/__init__.py:3-8`:
try:
import gymnasium as gym
GYM_INSTALLED = True
except ImportError:
GYM_INSTALLED = False
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ModuleNotFoundError: No module named 'sklearn'` | scikit-learn not installed | `pip install scikit-learn>=1.5.1`; required for `river.compat` module |
| `ModuleNotFoundError: No module named 'sqlalchemy'` | SQLAlchemy not installed | `pip install sqlalchemy>=2.0`; required for `river.stream.iter_sql` |
| `ValueError: You have to install graphviz` | graphviz not installed | `pip install graphviz>=0.20.1`; required for tree visualization |
| `ModuleNotFoundError: No module named 'gymnasium'` | gymnasium not installed | `pip install gymnasium>=0.29.0`; required for bandit environments |
Compatibility Notes
- Python 3.10: The `pyproject.toml` specifies `requires-python >= 3.11`. The ruff linter targets Python 3.10, but the package itself requires 3.11+.
- Windows: Fully supported for pre-built wheels. Source builds require adjustments (no `-lm` math library link).
- macOS: Fully supported. Apple Silicon (arm64) wheels available.
- NumPy 2.x: River requires NumPy >= 2.3.4, which is part of the NumPy 2.x series. Older NumPy 1.x is not supported.
Related Pages
- Implementation:Online_ml_River_Datasets_Phishing
- Implementation:Online_ml_River_Compose_Pipeline
- Implementation:Online_ml_River_Preprocessing_StandardScaler
- Implementation:Online_ml_River_Linear_Model_LogisticRegression
- Implementation:Online_ml_River_Tree_HoeffdingTreeClassifier
- Implementation:Online_ml_River_Metrics_Accuracy
- Implementation:Online_ml_River_Evaluate_Progressive_Val_Score
- Implementation:Online_ml_River_Anomaly_HalfSpaceTrees
- Implementation:Online_ml_River_Drift_ADWIN
- Implementation:Online_ml_River_Forest_ARFClassifier
- Implementation:Online_ml_River_Cluster_KMeans
- Implementation:Online_ml_River_Time_Series_SNARIMAX