Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Imblearn ChebyshevSampler

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Imbalanced_Learning, Regression, Sampling
Last Updated 2026-02-08 16:00 GMT

Overview

Chebyshev-based sampling methods for imbalanced regression that use Chebyshev's inequality to identify and bias sampling toward rare target values.

Description

These methods apply Chebyshev's inequality to determine if target values are rare (far from the mean) or frequent (close to the mean). For any value y with deviation t = |y - mean|/std, values with t > 1 are considered potentially rare. ChebyshevUnderSampler calculates selection probability as sigma^2/|y-mean|^2 for rare cases, under-sampling frequent values while preserving rare ones. It includes a "second chance" mechanism (sp parameter) to occasionally train on frequent cases. ChebyshevOverSampler uses ceil(t) as a repetition weight, training the model multiple times on rare instances. Both maintain running estimates of target mean and variance.

Usage

Use ChebyshevUnderSampler when you have imbalanced regression with many frequent target values crowding out rare ones, and computational budget is limited. Use ChebyshevOverSampler when you can afford additional computation and want to emphasize rare values through repetition. These methods automatically adapt to the target distribution without requiring manual threshold specification. The sp parameter (under-sampler only) controls how conservatively frequent cases are discarded. Both methods work best after an initial warm-up period when statistics stabilize. Suitable for applications like demand forecasting, financial prediction, or environmental monitoring where extreme values are informative.

Code Reference

Source Location

Signature

class ChebyshevUnderSampler(
    regressor: base.Regressor,
    sp: float = 0.15,
    seed: int | None = None,
)

class ChebyshevOverSampler(
    regressor: base.Regressor
)

Import

from river import imblearn

I/O Contract

Input
Parameter Type Description
x dict Feature dictionary
y float Target value for regression
Output
Method Return Type Description
predict_one(x) float Delegates to wrapped regressor
learn_one(x, y) None Selective/repeated training based on rarity

Usage Examples

from river import datasets
from river import evaluate
from river import imblearn
from river import metrics
from river import preprocessing
from river import rules

# Under-sampling example
model = (
    preprocessing.StandardScaler() |
    imblearn.ChebyshevUnderSampler(
        regressor=rules.AMRules(n_min=50, delta=0.01),
        seed=42
    )
)

result = evaluate.progressive_val_score(
    datasets.TrumpApproval(),
    model,
    metrics.MAE(),
    print_every=500
)
print(result)  # MAE: 1.515236

# Over-sampling example
model = (
    preprocessing.StandardScaler() |
    imblearn.ChebyshevOverSampler(
        regressor=rules.AMRules(n_min=50, delta=0.01)
    )
)

result = evaluate.progressive_val_score(
    datasets.TrumpApproval(),
    model,
    metrics.MAE(),
    print_every=500
)
print(result)  # MAE: 1.66253

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment