Heuristic:Online ml River SNARIMAX Default Regressor Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Time_Series, Online_Learning, Parameter_Tuning |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
SNARIMAX defaults to StandardScaler piped into LinearRegression as its internal regressor; swap for task-specific models.
Description
The SNARIMAX time series model internally uses a regressor to learn the mapping from lag features (AR terms), seasonal lag features, and error terms (MA terms) to the predicted value. When no regressor is explicitly provided, SNARIMAX constructs a default pipeline of `preprocessing.StandardScaler() | linear_model.LinearRegression()`. This default is a sensible baseline that handles feature scaling automatically, but users can substitute any River regressor or pipeline for improved performance on specific data characteristics.
Usage
Apply this heuristic when configuring SNARIMAX for time series forecasting. The default regressor works well for linear trends but may underperform on non-linear patterns. Consider substituting with tree-based regressors (e.g., `HoeffdingTreeRegressor`) for non-linear data, or adding polynomial feature expansion for moderate non-linearity.
The Insight (Rule of Thumb)
- Action: Understand that SNARIMAX default regressor is `StandardScaler | LinearRegression`, and consider replacing it for non-linear data.
- Value: Default pipeline: `preprocessing.StandardScaler() | linear_model.LinearRegression()`
- Trade-off:
- Default (linear): Fast, interpretable, works well for linear trends and seasonality. May underfit non-linear patterns.
- Tree regressor: Handles non-linear relationships but may overfit with small AR/MA orders.
- Custom pipeline: Full flexibility (e.g., polynomial features + regression) but increases complexity.
- Order selection: Start with ARIMA(1,1,1) for most time series. Add seasonal terms (sp, sq) with appropriate period `m` (12 for monthly, 7 for daily).
- Differencing: `d=1` removes linear trend; `d=2` rarely needed and can cause instability. Seasonal differencing `sd=1` with period `m` removes seasonal patterns.
Reasoning
The default `StandardScaler | LinearRegression` pipeline was chosen because:
- StandardScaler: Lag features (AR terms) and error terms (MA terms) can have different scales, especially after differencing. Scaling prevents the linear regression from being dominated by features with larger magnitudes.
- LinearRegression: The traditional ARIMA model is fundamentally a linear model. Using LinearRegression preserves this theoretical foundation. The "N" in SNARIMAX (Non-linear) comes from the ability to swap in a non-linear regressor, but the default maintains classical behavior.
- Pipeline composition: River's pipe operator (`|`) makes it trivial to compose preprocessing and modeling steps, which is why the default is a pipeline rather than a bare regressor.
Code Evidence
SNARIMAX default regressor from `river/time_series/snarimax.py:292-296`:
self.regressor = (
regressor
if regressor is not None
else preprocessing.StandardScaler() | linear_model.LinearRegression()
)
SNARIMAX lag feature construction from `river/time_series/snarimax.py:302-318`:
def _add_lag_features(self, x, Y, errors):
if x is None:
x = {}
# AR
for t in range(self.p):
try:
x[f"y-{t+1}"] = Y[t]
except IndexError:
break
# Seasonal AR
for t in range(self.m - 1, self.m * self.sp, self.m):
try:
x[f"sy-{t+1}"] = Y[t]
except IndexError:
break