Implementation:Online ml River Imblearn RandomSampler
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Imbalanced_Learning, Classification, Sampling |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Random sampling methods (under-sampling, over-sampling, and mixed) that adjust training class distributions to match desired targets using online rejection and acceptance sampling.
Description
These wrappers modify the class distribution seen by the underlying classifier. RandomUnderSampler uses rejection sampling to selectively skip majority class instances. RandomOverSampler uses Poisson sampling to train multiple times on minority class instances. RandomSampler combines both approaches with a sampling_rate parameter. All methods maintain running counts of actual class distribution and desired distribution. They dynamically identify a pivot class and compute acceptance/repetition rates relative to it. The implementations handle distribution shifts by recalculating the pivot when needed. Under-sampling discards instances randomly to match desired ratios; over-sampling replicates instances; mixed sampling does both.
Usage
Use RandomUnderSampler when you have abundant majority class data and want to reduce computational cost while balancing classes. Use RandomOverSampler when minority class data is precious and you want to emphasize it through repetition. Use RandomSampler for general-purpose rebalancing with the sampling_rate controlling overall data usage (< 1 for less data, > 1 for more training). Set desired_dist to target class proportions (must sum to 1). These methods work best with significant class imbalance. For binary classification with 10% minority class, try desired_dist={False: 0.5, True: 0.5} or {False: 0.4, True: 0.6} depending on how much to favor the minority class.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/imblearn/random.py
Signature
class RandomUnderSampler(
classifier: base.Classifier,
desired_dist: dict,
seed: int | None = None,
)
class RandomOverSampler(
classifier: base.Classifier,
desired_dist: dict,
seed: int | None = None,
)
class RandomSampler(
classifier: base.Classifier,
desired_dist: dict,
sampling_rate=1.0,
seed: int | None = None,
)
Import
from river import imblearn
I/O Contract
| Parameter | Type | Description |
|---|---|---|
| x | dict | Feature dictionary |
| y | Any | Class label (any hashable type) |
| Method | Return Type | Description |
|---|---|---|
| predict_one(x) | Any | Delegates to wrapped classifier |
| predict_proba_one(x) | dict | Delegates to wrapped classifier |
| learn_one(x, y) | None | Selective/repeated training based on class |
Usage Examples
from river import datasets
from river import evaluate
from river import imblearn
from river import linear_model
from river import metrics
from river import preprocessing
# Under-sampling example
model = imblearn.RandomUnderSampler(
(
preprocessing.StandardScaler() |
linear_model.LogisticRegression()
),
desired_dist={False: 0.4, True: 0.6},
seed=42
)
dataset = datasets.CreditCard().take(3000)
metric = metrics.LogLoss()
result = evaluate.progressive_val_score(dataset, model, metric)
print(result) # LogLoss: 0.0336...
# Over-sampling example
model = imblearn.RandomOverSampler(
(
preprocessing.StandardScaler() |
linear_model.LogisticRegression()
),
desired_dist={False: 0.4, True: 0.6},
seed=42
)
result = evaluate.progressive_val_score(dataset, model, metric)
print(result) # LogLoss: 0.0457...
# Mixed sampling example
model = imblearn.RandomSampler(
(
preprocessing.StandardScaler() |
linear_model.LogisticRegression()
),
desired_dist={False: 0.4, True: 0.6},
sampling_rate=0.8,
seed=42
)
result = evaluate.progressive_val_score(dataset, model, metric)
print(result) # LogLoss: 0.09...