Principle:Online ml River Pipeline Transformers

Knowledge Sources	Machine Learning Functional Programming
Domains	Online_Learning Feature_Engineering Software_Design
Last Updated	2026-02-08 18:00 GMT

Overview

Functional and compositional transformers are modular building blocks for constructing feature engineering pipelines in online machine learning. They enable practitioners to define custom transformations, select feature subsets, rename features, group operations, and combine transformers through unions and products -- all within the incremental, one-instance-at-a-time paradigm.

Description

Feature engineering in online ML requires transformers that process individual observations incrementally. Beyond standard preprocessing (scaling, encoding), real-world pipelines often need:

Function transformers: Wrap arbitrary Python functions as transformer objects, enabling ad-hoc feature engineering within a pipeline.
Feature selection: Select a subset of features by name or pattern, discarding irrelevant inputs.
Feature renaming: Rename or prefix features to avoid collisions when combining multiple feature sources.
Grouping: Apply a transformer independently to subgroups defined by a categorical feature (e.g., per-user or per-category statistics).
Target transformation: Transform the target variable before regression and inverse-transform predictions, enabling techniques like log-target regression.
Transformer unions: Run multiple transformers in parallel on the same input and concatenate their outputs, creating richer feature representations.
Transformer products: Combine transformers multiplicatively to create interaction features.

Each of these components implements the standard transformer interface (learn_one, transform_one), ensuring they compose seamlessly with pipelines and estimators.

Usage

Use functional and compositional transformers when:

You need custom feature engineering steps within a streaming pipeline.
You want to select, rename, or reorganize features programmatically.
You need group-level transformations (e.g., per-category statistics).
You want to combine multiple feature extraction paths into a single feature vector.
You need to transform the target variable for regression tasks.

Theoretical Basis

Function transformer: Lifts any function $f : 𝒳 \to 𝒳^{'}$ into the transformer interface:

transform_one(x) = f(x)
learn_one(x) = no-op  (stateless)

Feature selection as projection: Given feature space $𝒳 = {f_{1}, f_{2}, \dots, f_{d}}$ and a subset $S \subseteq {1, \dots, d}$ :

transform_one(x) = {f_i: x[f_i] for i in S}

Transformer union (parallel composition): Given transformers $T_{1}, T_{2}, \dots, T_{k}$ :

transform_one(x) = T_1(x) | T_2(x) | ... | T_k(x)

Where | denotes dictionary merge. The output feature space is the union of all transformers' output features. Feature name prefixing prevents collisions.

Grouper (conditional transformation): Given a grouping key $g$ and a transformer $T$ , the grouper maintains a separate instance $T_{v}$ for each value $v$ of $g$ :

transform_one(x) = T_{x[g]}.transform_one(x)
learn_one(x) = T_{x[g]}.learn_one(x)

Target transform regression: Given a target transformation $h$ and its inverse $h^{- 1}$ :

learn_one(x, y):  regressor.learn_one(x, h(y))
predict_one(x):   return h_inv(regressor.predict_one(x))

Common choices for $h$ include log transform and Box-Cox transform.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment