Principle:Online ml River Feature Normalization

Knowledge Sources	River River Docs
Domains	Online Machine Learning, Data Preprocessing, Feature Engineering
Last Updated	2026-02-08 16:00 GMT

Overview

Online feature scaling technique that transforms features to a fixed [0, 1] range using running minimum and maximum statistics, enabling incremental normalization without storing the full dataset.

Description

Feature normalization is a critical preprocessing step in machine learning that rescales feature values to a common range. In the streaming (online) setting, traditional batch min-max scaling is not feasible because the full dataset is never available at once. Instead, online feature normalization maintains running statistics -- specifically a running minimum and a running maximum for each feature -- and uses these to incrementally transform each incoming observation.

The key advantage of online feature normalization is that it operates in constant memory and constant time per observation. As new data arrives, the running minimum and maximum are updated, and the transformation is applied using only these two statistics. This makes it suitable for data streams where observations arrive one at a time and the distribution of features may shift over time.

Online min-max scaling is particularly important as a preprocessing step for anomaly detection algorithms such as Half-Space Trees, which assume features are bounded in [0, 1]. Without proper normalization, such algorithms may produce unreliable anomaly scores.

Usage

Use online feature normalization when:

Features have different scales and need to be brought into a common [0, 1] range
Data arrives as a stream and batch normalization is not possible
A downstream model (e.g., Half-Space Trees) requires features in a bounded range
Memory is constrained and the full dataset cannot be stored
The feature distribution may shift over time (the running min/max will adapt)

Theoretical Basis

The min-max normalization formula for a single feature value x is:

x_scaled = (x - x_min) / (x_max - x_min)

Where:

x_min is the running minimum of the feature observed so far
x_max is the running maximum of the feature observed so far

Safe division: When x_max equals x_min (i.e., all observed values for a feature are identical), the denominator is zero. In this case, the transformation returns 0 to avoid division-by-zero errors.

Online update rules:

For each new observation x_t:
    x_min = min(x_min, x_t)
    x_max = max(x_max, x_t)
    x_scaled = safe_div(x_t - x_min, x_max - x_min)

Where safe_div(a, b) returns a / b if b != 0, otherwise returns 0.

Properties:

Time complexity: O(d) per observation, where d is the number of features
Space complexity: O(d) -- stores one Min and one Max statistic per feature
Output range: [0, 1] for each feature (guaranteed after at least two distinct values have been seen)

Pseudocode:

INIT:
    min_stats = {}    # running Min per feature
    max_stats = {}    # running Max per feature

LEARN_ONE(x):
    for each feature i in x:
        min_stats[i].update(x[i])
        max_stats[i].update(x[i])

TRANSFORM_ONE(x):
    result = {}
    for each feature i in x:
        result[i] = safe_div(x[i] - min_stats[i].get(), max_stats[i].get() - min_stats[i].get())
    return result

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment