Implementation:Online ml River Imblearn ChebyshevSampler
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Imbalanced_Learning, Regression, Sampling |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Chebyshev-based sampling methods for imbalanced regression that use Chebyshev's inequality to identify and bias sampling toward rare target values.
Description
These methods apply Chebyshev's inequality to determine if target values are rare (far from the mean) or frequent (close to the mean). For any value y with deviation t = |y - mean|/std, values with t > 1 are considered potentially rare. ChebyshevUnderSampler calculates selection probability as sigma^2/|y-mean|^2 for rare cases, under-sampling frequent values while preserving rare ones. It includes a "second chance" mechanism (sp parameter) to occasionally train on frequent cases. ChebyshevOverSampler uses ceil(t) as a repetition weight, training the model multiple times on rare instances. Both maintain running estimates of target mean and variance.
Usage
Use ChebyshevUnderSampler when you have imbalanced regression with many frequent target values crowding out rare ones, and computational budget is limited. Use ChebyshevOverSampler when you can afford additional computation and want to emphasize rare values through repetition. These methods automatically adapt to the target distribution without requiring manual threshold specification. The sp parameter (under-sampler only) controls how conservatively frequent cases are discarded. Both methods work best after an initial warm-up period when statistics stabilize. Suitable for applications like demand forecasting, financial prediction, or environmental monitoring where extreme values are informative.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/imblearn/chebyshev.py
Signature
class ChebyshevUnderSampler(
regressor: base.Regressor,
sp: float = 0.15,
seed: int | None = None,
)
class ChebyshevOverSampler(
regressor: base.Regressor
)
Import
from river import imblearn
I/O Contract
| Parameter | Type | Description |
|---|---|---|
| x | dict | Feature dictionary |
| y | float | Target value for regression |
| Method | Return Type | Description |
|---|---|---|
| predict_one(x) | float | Delegates to wrapped regressor |
| learn_one(x, y) | None | Selective/repeated training based on rarity |
Usage Examples
from river import datasets
from river import evaluate
from river import imblearn
from river import metrics
from river import preprocessing
from river import rules
# Under-sampling example
model = (
preprocessing.StandardScaler() |
imblearn.ChebyshevUnderSampler(
regressor=rules.AMRules(n_min=50, delta=0.01),
seed=42
)
)
result = evaluate.progressive_val_score(
datasets.TrumpApproval(),
model,
metrics.MAE(),
print_every=500
)
print(result) # MAE: 1.515236
# Over-sampling example
model = (
preprocessing.StandardScaler() |
imblearn.ChebyshevOverSampler(
regressor=rules.AMRules(n_min=50, delta=0.01)
)
)
result = evaluate.progressive_val_score(
datasets.TrumpApproval(),
model,
metrics.MAE(),
print_every=500
)
print(result) # MAE: 1.66253