Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Online ml River Time Series Stream Loading

From Leeroopedia
Revision as of 17:45, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Online_ml_River_Time_Series_Stream_Loading.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources Domains Last Updated
River River Docs Online Machine Learning, Time Series Forecasting 2026-02-08 16:00 GMT

Overview

Technique for loading benchmark time series datasets as sequential observation streams for incremental forecasting evaluation.

Description

Time series data naturally arrives one observation at a time in chronological order. In the River library, built-in dataset classes encapsulate standardized benchmark time series and expose them as iterators that yield sequential (x, y) tuples, where x is a dictionary of features (typically a timestamp) and y is the numeric target value. This streaming interface aligns perfectly with online learning: each observation is consumed once, in order, without requiring the entire dataset to reside in memory.

River provides several built-in time series datasets with known characteristics:

  • AirlinePassengers: A classic monthly dataset containing 144 observations of international airline passenger totals from January 1949 to December 1960. The series exhibits a clear upward trend and multiplicative yearly seasonality, making it an ideal benchmark for testing seasonal forecasting models.
  • WaterFlow: An hourly dataset containing 1,268 observations of water flow through a pipeline branch measured in liters per second, spanning March to May 2022. This dataset includes anomalous segments caused by maintenance interventions and pumping operations, making it suitable for evaluating forecaster robustness.

Both datasets inherit from base.FileDataset, which handles file I/O and provides metadata such as the number of samples and features.

Usage

Use time series stream loading when:

  • You need a standardized benchmark to evaluate or compare online forecasting models
  • You want to simulate real-time data arrival for testing incremental learning pipelines
  • You need a dataset with known seasonal patterns (e.g., monthly seasonality in AirlinePassengers, hourly patterns in WaterFlow)
  • You are prototyping a new forecasting workflow and need a quick, self-contained data source

Theoretical Basis

Time series forecasting fundamentally depends on the sequential nature of data. Unlike cross-sectional data where observations are independent, time series observations exhibit temporal dependencies: the value at time t depends on values at t-1, t-2, and so on. This autocorrelation structure is what forecasting models exploit.

In a streaming context, data arrives as a sequence:

(x_1, y_1), (x_2, y_2), ..., (x_t, y_t), ...

At each step t, the model:

  1. Optionally produces a forecast for future steps
  2. Observes the true value y_t
  3. Updates its internal state using y_t (and optionally x_t)

Built-in datasets provide this stream interface via Python iterators, ensuring that the data is consumed in the correct temporal order. The choice of benchmark matters:

  • Seasonal patterns: AirlinePassengers has period m = 12 (monthly within yearly cycle), requiring models to capture both trend and multiplicative seasonality.
  • Anomaly robustness: WaterFlow contains anomalous segments that test a model's ability to adapt without being destabilized by outliers.

The streaming paradigm also enables walk-forward validation (also known as progressive validation), where at each time step the model is evaluated on unseen future data before being updated with the true observation.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment