Implementation:Online ml River FeatureSelection SelectKBest

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Feature_Selection, Supervised_Learning
Last Updated	2026-02-08 16:00 GMT

Overview

Selects the k features with highest scores based on a similarity metric computed incrementally.

Description

SelectKBest maintains running similarity statistics (like Pearson correlation) between each feature and the target. It ranks features by their similarity scores and retains only the top k features during transformation. The similarity metrics are updated incrementally as new samples arrive. The use_abs parameter allows ranking by absolute similarity values, useful when negative correlations are as informative as positive ones. A leaderboard tracks current feature rankings.

Usage

Use this for supervised feature selection when you have many features and want to retain only the most predictive ones. Helps reduce dimensionality, prevent overfitting, and improve model interpretability. Common similarity metrics include Pearson correlation for numeric targets and mutual information for classification. The streaming nature makes it suitable for high-dimensional online learning where memory is constrained.

Code Reference

Source Location

Repository: Online_ml_River
File: river/feature_selection/k_best.py

Signature

class SelectKBest(base.SupervisedTransformer):
    def __init__(self, similarity: stats.base.Bivariate, k=10, use_abs: bool = False)

Import

from river import feature_selection
from river import stats

I/O Contract

Input	Output
Dict[str, float] - All features	Dict[str, float] - Top k features

Usage Examples

from pprint import pprint
from river import feature_selection
from river import stats
from river import stream
from sklearn import datasets

X, y = datasets.make_regression(
    n_samples=100,
    n_features=10,
    n_informative=2,
    random_state=42
)

selector = feature_selection.SelectKBest(
    similarity=stats.PearsonCorr(),
    k=2
)

for xi, yi, in stream.iter_array(X, y):
    selector.learn_one(xi, yi)

pprint(selector.leaderboard)
# Counter({9: 0.7898,
#          7: 0.5444,
#          8: 0.1062,
#          ...})

selector.transform_one(xi)
# {7: -1.2795, 9: -1.8408}

# Using use_abs parameter for negative correlations
import random
random.seed(42)
X_abs = [[random.random() for _ in range(3)] for _ in range(100)]
y_abs = [0.6 * x[0] - 0.9 * x[1] + 0.1 * x[2] + random.gauss(0, 0.1) for x in X_abs]

selector_with_abs = feature_selection.SelectKBest(
    stats.PearsonCorr(),
    k=1,
    use_abs=True
)
for xi, yi in stream.iter_array(X_abs, y_abs):
    selector_with_abs.learn_one(xi, yi)

selector_with_abs.transform_one({i: v for i, v in enumerate(X_abs[-1])})
# {1: 0.07524386007376704}  # Selected feature 1 due to high absolute correlation

Related Pages

Environment:Online_ml_River_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment