Implementation:Online ml River Datasets AirlinePassengers
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs | Online Machine Learning, Time Series Forecasting | 2026-02-08 16:00 GMT |
Overview
Concrete tool for loading the AirlinePassengers and WaterFlow benchmark time series datasets as sequential observation streams for online forecasting evaluation.
Description
The datasets.AirlinePassengers and datasets.WaterFlow classes provide built-in time series datasets that yield observations one at a time in chronological order. Both classes inherit from base.FileDataset and implement the iterator protocol via __iter__, which internally uses stream.iter_csv to parse the underlying CSV file with appropriate type conversions and date parsing.
AirlinePassengers contains 144 monthly observations of international airline passenger totals (in thousands) from January 1949 to December 1960. It has 1 feature (month as a datetime.date object) and an integer target representing the number of passengers.
WaterFlow contains 1,268 hourly observations of water flow (in liters per second) through a pipeline branch from March to May 2022. It has 1 feature (Time as a timezone-aware datetime) and a float target representing the flow rate. The dataset includes four anomalous segments suitable for testing forecaster robustness.
Usage
Import and iterate these datasets when you need a benchmark time series stream for evaluating online forecasting models such as SNARIMAX or HoltWinters.
Code Reference
Source Location
river/datasets/airline_passengers.py:L8-L35river/datasets/water_flow.py:L8-L40
Signature
class AirlinePassengers(base.FileDataset):
def __init__(self) -> None
class WaterFlow(base.FileDataset):
def __init__(self) -> None
Import
from river import datasets
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| (none) | Both constructors take no parameters |
Outputs
| Output | Type | Description |
|---|---|---|
| Iterator element | (x: dict, y: number) |
Each iteration yields a tuple of feature dict and target value |
| AirlinePassengers x | {"month": datetime.date} |
Parsed date for the month |
| AirlinePassengers y | int |
Number of passengers (thousands) |
| WaterFlow x | {"Time": datetime} |
Timezone-aware datetime |
| WaterFlow y | float |
Water flow rate in liters per second |
Dataset metadata:
| Dataset | n_samples | n_features | Target Type | Temporal Resolution |
|---|---|---|---|---|
| AirlinePassengers | 144 | 1 | int | Monthly |
| WaterFlow | 1,268 | 1 | float | Hourly |
Usage Examples
Iterating over AirlinePassengers
from river import datasets
dataset = datasets.AirlinePassengers()
for x, y in dataset:
print(x["month"], y)
# x = {"month": datetime.date(1949, 1, 1)}, y = 112
break
Iterating over WaterFlow
from river import datasets
dataset = datasets.WaterFlow()
for x, y in dataset:
print(x["Time"], y)
break
Using with a forecasting model
from river import datasets
from river import time_series
dataset = datasets.AirlinePassengers()
model = time_series.SNARIMAX(p=12, d=1, q=12, m=12, sd=1)
for x, y in dataset:
model.learn_one(y)
forecast = model.forecast(horizon=12)