Principle:Online ml River Online Feature Extraction

Knowledge Sources	Domains	Last Updated
Machine Learning Feature Engineering	Online_Learning, Feature_Engineering, Data_Preprocessing	2026-02-08 18:00 GMT

Overview

Online feature extraction refers to the incremental computation of derived features from raw input data in a streaming setting. Unlike batch feature extraction where the entire dataset is available, streaming feature extractors must compute transformations using only the data seen so far, updating their internal state with each new observation.

Description

Feature extraction transforms raw input variables into representations that are more informative for downstream learning algorithms. In the online learning setting, feature extraction must be performed incrementally as each observation arrives. This imposes constraints: transformations cannot rely on global statistics (unless maintained incrementally) and must operate efficiently in constant time and memory per observation.

Key categories of streaming feature extraction include:

Aggregation-based features: Computing running statistics (mean, variance, count, min, max) grouped by categorical variables. These features capture temporal patterns and per-group behavior, analogous to SQL GROUP BY aggregations computed incrementally.

Polynomial feature expansion: Generating interaction and polynomial terms from existing features. For features $x_{1}, x_{2}$ , a degree-2 polynomial expansion produces $x_{1}^{2}, x_{1} x_{2}, x_{2}^{2}$ in addition to the original features. This allows linear models to capture nonlinear relationships.

Random kernel approximation: Using random Fourier features (e.g., RBF sampler) to approximate nonlinear kernel functions. This maps inputs into a higher-dimensional space where linear methods can approximate kernel methods, with the mapping computed in constant time per sample.

Text vectorization: Converting text data into numerical feature vectors using techniques such as bag-of-words or TF-IDF, maintained incrementally as the vocabulary grows.

Usage

Use online feature extraction when:

You need to enrich raw features with interactions, aggregations, or nonlinear transformations in a streaming pipeline.
You are building online learning pipelines that require feature engineering as a preprocessing step.
You need to approximate kernel methods efficiently using random features.
You are processing text data and need incremental vectorization.

Theoretical Basis

Polynomial Expansion

Given input x = (x_1, ..., x_d) and degree p:
  Generate all monomials x_{i1}^{a1} * ... * x_{ik}^{ak}
  where a1 + ... + ak <= p

For degree 2 with features (x1, x2):
  Output: (x1, x2, x1*x1, x1*x2, x2*x2)

Number of features: C(d + p, p) - 1

Random Fourier Features (RBF Sampler)

The RBF kernel $k (x, y) = \exp (- γ | | x - y | |^{2})$ can be approximated via random features:

1. Sample D random vectors w_1, ..., w_D ~ N(0, 2*gamma*I)
2. Sample D random offsets b_1, ..., b_D ~ Uniform(0, 2*pi)
3. Map input x to:
   z(x) = sqrt(2/D) * [cos(w_1^T x + b_1), ..., cos(w_D^T x + b_D)]

Property: E[z(x)^T z(y)] = k(x, y)

This mapping is computed once per sample in $O (D d)$ time and enables linear models to approximate RBF kernel methods.

Streaming Aggregation

For a feature grouped by key k:
  Maintain running statistics per group:
    count[k] += 1
    mean[k] += (x - mean[k]) / count[k]          (Welford's method)
    M2[k] += (x - old_mean) * (x - mean[k])
    var[k] = M2[k] / count[k]

These aggregated statistics can serve as features for downstream models, capturing group-level behavior patterns.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment