Implementation:Online ml River Preprocessing StandardScaler
| Knowledge Sources | River River Docs |
|---|---|
| Domains | Online_Learning Feature_Engineering Statistics |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Concrete tool for incrementally standardizing features to zero mean and unit variance using Welford's online algorithm, supporting both single-instance and mini-batch updates.
Description
The preprocessing.StandardScaler class maintains running statistics (count, mean, and variance) for each feature using Welford's online algorithm. When transform_one is called, it subtracts the running mean and divides by the running standard deviation for each feature, producing standardized values with approximately zero mean and unit variance.
The class inherits from base.MiniBatchTransformer, which means it supports both single-instance methods (learn_one, transform_one) and mini-batch methods (learn_many, transform_many) that operate on Pandas DataFrames. The mini-batch update uses a parallel merge formula to correctly combine existing statistics with batch statistics.
The scaler handles edge cases gracefully: if a feature has zero variance (standard deviation is zero), the transformed value is set to 0.0 to avoid division by zero. The with_std parameter controls whether scaling by standard deviation is applied; when set to False, only mean centering is performed.
Internally, the running statistics are stored in collections.Counter (for counts) and collections.defaultdict(float) (for means and variances), making the scaler naturally handle features that appear and disappear dynamically.
Usage
Import this class when you need to:
- Standardize features before feeding them to gradient-based models like logistic regression.
- Build a pipeline where feature scaling precedes a classifier or regressor.
- Process streaming data where the feature statistics are not known in advance.
- Handle both single-instance and mini-batch data.
Code Reference
Source Location
| File | Lines |
|---|---|
river/preprocessing/scale.py |
L80-L249 |
Signature
class StandardScaler(base.MiniBatchTransformer):
def __init__(self, with_std=True) -> None
# Single-instance methods
def learn_one(self, x: dict)
def transform_one(self, x: dict) -> dict
# Mini-batch methods
def learn_many(self, X: pd.DataFrame)
def transform_many(self, X: pd.DataFrame) -> pd.DataFrame
Import
from river import preprocessing
scaler = preprocessing.StandardScaler()
I/O Contract
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
with_std |
bool |
True |
Whether to scale features to unit variance. If False, only mean centering is applied.
|
x (to learn_one/transform_one) |
dict |
(required) | Feature dictionary mapping feature names to numeric values. |
X (to learn_many/transform_many) |
pd.DataFrame |
(required) | DataFrame where each column is a feature. |
Outputs
| Method | Return Type | Description |
|---|---|---|
transform_one(x) |
dict |
Feature dictionary with standardized values (zero mean, unit variance). |
transform_many(X) |
pd.DataFrame |
DataFrame with standardized columns. |
Usage Examples
Basic single-instance usage:
from river import preprocessing
scaler = preprocessing.StandardScaler()
X = [
{'x': 10.557, 'y': 8.100},
{'x': 9.100, 'y': 8.892},
{'x': 10.945, 'y': 10.706},
]
for x in X:
scaler.learn_one(x)
print(scaler.transform_one(x))
# {'x': 0.0, 'y': 0.0}
# {'x': -0.999, 'y': 0.999}
# {'x': 0.937, 'y': 1.350}
In a pipeline:
from river import datasets, evaluate, linear_model, metrics, preprocessing
model = preprocessing.StandardScaler() | linear_model.LogisticRegression()
metric = metrics.Accuracy()
evaluate.progressive_val_score(datasets.Phishing(), model, metric)
# Accuracy: 88.96%
Mini-batch usage:
import pandas as pd
from river import preprocessing
scaler = preprocessing.StandardScaler()
X = pd.DataFrame({'x': [10.5, 9.1, 10.9], 'y': [8.1, 8.9, 10.7]})
scaler.learn_many(X)
print(scaler.transform_many(X))
Mean centering only (no std scaling):
from river import preprocessing
scaler = preprocessing.StandardScaler(with_std=False)
x = {'a': 5.0, 'b': 10.0}
scaler.learn_one(x)
print(scaler.transform_one(x))
# {'a': 0.0, 'b': 0.0}