Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Online ml River Preprocessing StandardScaler

From Leeroopedia


Knowledge Sources River River Docs
Domains Online_Learning Feature_Engineering Statistics
Last Updated 2026-02-08 16:00 GMT

Overview

Concrete tool for incrementally standardizing features to zero mean and unit variance using Welford's online algorithm, supporting both single-instance and mini-batch updates.

Description

The preprocessing.StandardScaler class maintains running statistics (count, mean, and variance) for each feature using Welford's online algorithm. When transform_one is called, it subtracts the running mean and divides by the running standard deviation for each feature, producing standardized values with approximately zero mean and unit variance.

The class inherits from base.MiniBatchTransformer, which means it supports both single-instance methods (learn_one, transform_one) and mini-batch methods (learn_many, transform_many) that operate on Pandas DataFrames. The mini-batch update uses a parallel merge formula to correctly combine existing statistics with batch statistics.

The scaler handles edge cases gracefully: if a feature has zero variance (standard deviation is zero), the transformed value is set to 0.0 to avoid division by zero. The with_std parameter controls whether scaling by standard deviation is applied; when set to False, only mean centering is performed.

Internally, the running statistics are stored in collections.Counter (for counts) and collections.defaultdict(float) (for means and variances), making the scaler naturally handle features that appear and disappear dynamically.

Usage

Import this class when you need to:

  • Standardize features before feeding them to gradient-based models like logistic regression.
  • Build a pipeline where feature scaling precedes a classifier or regressor.
  • Process streaming data where the feature statistics are not known in advance.
  • Handle both single-instance and mini-batch data.

Code Reference

Source Location

File Lines
river/preprocessing/scale.py L80-L249

Signature

class StandardScaler(base.MiniBatchTransformer):
    def __init__(self, with_std=True) -> None

    # Single-instance methods
    def learn_one(self, x: dict)
    def transform_one(self, x: dict) -> dict

    # Mini-batch methods
    def learn_many(self, X: pd.DataFrame)
    def transform_many(self, X: pd.DataFrame) -> pd.DataFrame

Import

from river import preprocessing

scaler = preprocessing.StandardScaler()

I/O Contract

Inputs

Parameter Type Default Description
with_std bool True Whether to scale features to unit variance. If False, only mean centering is applied.
x (to learn_one/transform_one) dict (required) Feature dictionary mapping feature names to numeric values.
X (to learn_many/transform_many) pd.DataFrame (required) DataFrame where each column is a feature.

Outputs

Method Return Type Description
transform_one(x) dict Feature dictionary with standardized values (zero mean, unit variance).
transform_many(X) pd.DataFrame DataFrame with standardized columns.

Usage Examples

Basic single-instance usage:

from river import preprocessing

scaler = preprocessing.StandardScaler()

X = [
    {'x': 10.557, 'y': 8.100},
    {'x': 9.100, 'y': 8.892},
    {'x': 10.945, 'y': 10.706},
]

for x in X:
    scaler.learn_one(x)
    print(scaler.transform_one(x))
# {'x': 0.0, 'y': 0.0}
# {'x': -0.999, 'y': 0.999}
# {'x': 0.937, 'y': 1.350}

In a pipeline:

from river import datasets, evaluate, linear_model, metrics, preprocessing

model = preprocessing.StandardScaler() | linear_model.LogisticRegression()
metric = metrics.Accuracy()

evaluate.progressive_val_score(datasets.Phishing(), model, metric)
# Accuracy: 88.96%

Mini-batch usage:

import pandas as pd
from river import preprocessing

scaler = preprocessing.StandardScaler()

X = pd.DataFrame({'x': [10.5, 9.1, 10.9], 'y': [8.1, 8.9, 10.7]})
scaler.learn_many(X)
print(scaler.transform_many(X))

Mean centering only (no std scaling):

from river import preprocessing

scaler = preprocessing.StandardScaler(with_std=False)

x = {'a': 5.0, 'b': 10.0}
scaler.learn_one(x)
print(scaler.transform_one(x))
# {'a': 0.0, 'b': 0.0}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment