Implementation:Online ml River Imblearn ChebyshevSampler

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Imbalanced_Learning, Regression, Sampling
Last Updated	2026-02-08 16:00 GMT

Overview

Chebyshev-based sampling methods for imbalanced regression that use Chebyshev's inequality to identify and bias sampling toward rare target values.

Description

These methods apply Chebyshev's inequality to determine if target values are rare (far from the mean) or frequent (close to the mean). For any value y with deviation t = |y - mean|/std, values with t > 1 are considered potentially rare. ChebyshevUnderSampler calculates selection probability as sigma^2/|y-mean|^2 for rare cases, under-sampling frequent values while preserving rare ones. It includes a "second chance" mechanism (sp parameter) to occasionally train on frequent cases. ChebyshevOverSampler uses ceil(t) as a repetition weight, training the model multiple times on rare instances. Both maintain running estimates of target mean and variance.

Usage

Use ChebyshevUnderSampler when you have imbalanced regression with many frequent target values crowding out rare ones, and computational budget is limited. Use ChebyshevOverSampler when you can afford additional computation and want to emphasize rare values through repetition. These methods automatically adapt to the target distribution without requiring manual threshold specification. The sp parameter (under-sampler only) controls how conservatively frequent cases are discarded. Both methods work best after an initial warm-up period when statistics stabilize. Suitable for applications like demand forecasting, financial prediction, or environmental monitoring where extreme values are informative.

Code Reference

Source Location

Repository: Online_ml_River
File: river/imblearn/chebyshev.py

Signature

class ChebyshevUnderSampler(
    regressor: base.Regressor,
    sp: float = 0.15,
    seed: int | None = None,
)

class ChebyshevOverSampler(
    regressor: base.Regressor
)

Import

from river import imblearn

I/O Contract

Input
Parameter	Type	Description
x	dict	Feature dictionary
y	float	Target value for regression

Output
Method	Return Type	Description
predict_one(x)	float	Delegates to wrapped regressor
learn_one(x, y)	None	Selective/repeated training based on rarity

Usage Examples

from river import datasets
from river import evaluate
from river import imblearn
from river import metrics
from river import preprocessing
from river import rules

# Under-sampling example
model = (
    preprocessing.StandardScaler() |
    imblearn.ChebyshevUnderSampler(
        regressor=rules.AMRules(n_min=50, delta=0.01),
        seed=42
    )
)

result = evaluate.progressive_val_score(
    datasets.TrumpApproval(),
    model,
    metrics.MAE(),
    print_every=500
)
print(result)  # MAE: 1.515236

# Over-sampling example
model = (
    preprocessing.StandardScaler() |
    imblearn.ChebyshevOverSampler(
        regressor=rules.AMRules(n_min=50, delta=0.01)
    )
)

result = evaluate.progressive_val_score(
    datasets.TrumpApproval(),
    model,
    metrics.MAE(),
    print_every=500
)
print(result)  # MAE: 1.66253

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment