Principle:Online ml River Online Feature Selection

Knowledge Sources	Domains	Last Updated
Machine Learning Statistics	Online_Learning, Feature_Selection, Dimensionality_Reduction	2026-02-08 18:00 GMT

Overview

Online feature selection is the incremental identification and retention of the most informative features from a data stream, discarding irrelevant or redundant features to reduce dimensionality, improve model performance, and decrease computational cost -- all without requiring access to the full dataset.

Description

Feature selection is a critical preprocessing step that identifies which input variables are most relevant to the prediction task. In batch learning, methods like mutual information ranking, recursive feature elimination, or LASSO can scan the entire dataset. In online learning, feature selection must be performed incrementally, updating importance estimates as each new observation arrives.

Three major strategies for online feature selection are:

Variance-based filtering: Features with very low variance carry little discriminative information. By tracking the running variance of each feature using Welford's online algorithm, features whose variance falls below a threshold can be removed. This is a simple, unsupervised filter that requires no access to the target variable.

Score-based selection (Select-K-Best): Each feature is scored based on its statistical relationship with the target variable (e.g., using mutual information, chi-squared, or ANOVA F-statistic). The top-K features with the highest scores are retained. In the online setting, these scores are updated incrementally.

Probabilistic inclusion (Poisson Inclusion): Each feature is randomly included or excluded at each time step according to a Poisson process. Features that consistently contribute to better predictions survive, while irrelevant features are naturally pruned through stochastic selection. This approach is particularly suited to non-stationary environments where feature relevance may change over time.

Usage

Use online feature selection when:

You are processing high-dimensional streaming data and need to reduce the feature space.
Feature relevance may change over time due to concept drift.
You want to improve model efficiency by discarding uninformative features.
You need a filter-based approach that integrates into an online learning pipeline.

Theoretical Basis

Variance Threshold

For each feature j, maintain running variance using Welford's method:
    count_j += 1
    delta = x_j - mean_j
    mean_j += delta / count_j
    delta2 = x_j - mean_j
    M2_j += delta * delta2
    variance_j = M2_j / count_j

Feature j is selected if: variance_j >= threshold

Select-K-Best (Online)

For each feature j, maintain an incremental relevance score S_j:
    S_j = f(feature_j, target)   (e.g., incremental mutual information)

At prediction time:
    selected_features = top_K(S_1, ..., S_d)
    return {x_j : j in selected_features}

Poisson Inclusion

For each feature j, maintain inclusion rate lambda_j:
    At each time step:
        k_j ~ Poisson(lambda_j)
        if k_j > 0:
            include feature j
        else:
            exclude feature j
    Update lambda_j based on feature utility feedback

The Poisson process provides a principled probabilistic framework for feature inclusion that naturally handles non-stationarity: as a feature becomes more or less relevant, its inclusion rate adapts accordingly.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment