Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River FeatureSelection SelectKBest

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Feature_Selection, Supervised_Learning
Last Updated 2026-02-08 16:00 GMT

Overview

Selects the k features with highest scores based on a similarity metric computed incrementally.

Description

SelectKBest maintains running similarity statistics (like Pearson correlation) between each feature and the target. It ranks features by their similarity scores and retains only the top k features during transformation. The similarity metrics are updated incrementally as new samples arrive. The use_abs parameter allows ranking by absolute similarity values, useful when negative correlations are as informative as positive ones. A leaderboard tracks current feature rankings.

Usage

Use this for supervised feature selection when you have many features and want to retain only the most predictive ones. Helps reduce dimensionality, prevent overfitting, and improve model interpretability. Common similarity metrics include Pearson correlation for numeric targets and mutual information for classification. The streaming nature makes it suitable for high-dimensional online learning where memory is constrained.

Code Reference

Source Location

Signature

class SelectKBest(base.SupervisedTransformer):
    def __init__(self, similarity: stats.base.Bivariate, k=10, use_abs: bool = False)

Import

from river import feature_selection
from river import stats

I/O Contract

Input Output
Dict[str, float] - All features Dict[str, float] - Top k features

Usage Examples

from pprint import pprint
from river import feature_selection
from river import stats
from river import stream
from sklearn import datasets

X, y = datasets.make_regression(
    n_samples=100,
    n_features=10,
    n_informative=2,
    random_state=42
)

selector = feature_selection.SelectKBest(
    similarity=stats.PearsonCorr(),
    k=2
)

for xi, yi, in stream.iter_array(X, y):
    selector.learn_one(xi, yi)

pprint(selector.leaderboard)
# Counter({9: 0.7898,
#          7: 0.5444,
#          8: 0.1062,
#          ...})

selector.transform_one(xi)
# {7: -1.2795, 9: -1.8408}

# Using use_abs parameter for negative correlations
import random
random.seed(42)
X_abs = [[random.random() for _ in range(3)] for _ in range(100)]
y_abs = [0.6 * x[0] - 0.9 * x[1] + 0.1 * x[2] + random.gauss(0, 0.1) for x in X_abs]

selector_with_abs = feature_selection.SelectKBest(
    stats.PearsonCorr(),
    k=1,
    use_abs=True
)
for xi, yi in stream.iter_array(X_abs, y_abs):
    selector_with_abs.learn_one(xi, yi)

selector_with_abs.transform_one({i: v for i, v in enumerate(X_abs[-1])})
# {1: 0.07524386007376704}  # Selected feature 1 due to high absolute correlation

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment