Principle:Online ml River Pipeline Composition

Knowledge Sources	River River Docs
Domains	Online_Learning Software_Design Classification
Last Updated	2026-02-08 16:00 GMT

Overview

Pipeline composition is a design pattern for chaining multiple transformers and a final estimator into a single composite model that orchestrates the learn/transform/predict lifecycle.

Description

In online machine learning, a typical workflow involves multiple preprocessing steps (e.g., feature scaling, encoding) followed by a learning algorithm. Pipeline composition formalizes this by allowing practitioners to chain estimators into a single object that behaves as a unified model. The resulting pipeline handles the flow of data through each step automatically: during learning, features are transformed step by step before reaching the final estimator; during prediction, the same transformations are applied before the final estimator produces its output.

River implements pipeline composition through the compose.Pipeline class and the pipe operator (|). The | operator provides an expressive, concise syntax for building pipelines:

model = preprocessing.StandardScaler() | linear_model.LogisticRegression()

This creates a pipeline where features are first scaled to zero mean and unit variance, then passed to a logistic regression classifier. The pipeline exposes the same API as any individual estimator (learn_one, predict_one, predict_proba_one), making it fully interchangeable with single-step models throughout River's ecosystem.

A key design decision is how unsupervised transformers are updated. By default, unsupervised steps (like scalers) are updated during learn_one. However, River also supports updating them during predict_one via the learn_during_predict context manager, which can provide a slight performance improvement in certain scenarios.

Usage

Use pipeline composition when:

You need to chain preprocessing steps with a final classifier or regressor.
You want a single object that can be passed to evaluate.progressive_val_score for evaluation.
You want to ensure that transformations are applied consistently during both training and prediction.
You are building complex feature engineering pipelines with unions, prefixers, or function transformers.

Theoretical Basis

Pipeline composition implements functional composition of operations. Given a sequence of transformers $T_{1}, T_{2}, \dots, T_{n - 1}$ and a final estimator $E$ , the pipeline defines:

Learning:

function learn_one(x, y):
    for i = 1 to n-1:
        if T_i is unsupervised:
            T_i.learn_one(x)
        x = T_i.transform_one(x)
        if T_i is supervised:
            T_i.learn_one(x_original, y)
    E.learn_one(x, y)

Prediction:

function predict_one(x):
    for i = 1 to n-1:
        x = T_i.transform_one(x)
    return E.predict_one(x)

An important subtlety in online pipelines is the order of update versus transform. During learn_one, each unsupervised transformer is first updated with the current features, then used to transform them. This ensures the transformer's statistics include the current observation before transformation. For supervised transformers (like target aggregations), the update happens after transformation to prevent target leakage.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment