Implementation:Online ml River Imblearn RandomSampler

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Imbalanced_Learning, Classification, Sampling
Last Updated	2026-02-08 16:00 GMT

Overview

Random sampling methods (under-sampling, over-sampling, and mixed) that adjust training class distributions to match desired targets using online rejection and acceptance sampling.

Description

These wrappers modify the class distribution seen by the underlying classifier. RandomUnderSampler uses rejection sampling to selectively skip majority class instances. RandomOverSampler uses Poisson sampling to train multiple times on minority class instances. RandomSampler combines both approaches with a sampling_rate parameter. All methods maintain running counts of actual class distribution and desired distribution. They dynamically identify a pivot class and compute acceptance/repetition rates relative to it. The implementations handle distribution shifts by recalculating the pivot when needed. Under-sampling discards instances randomly to match desired ratios; over-sampling replicates instances; mixed sampling does both.

Usage

Use RandomUnderSampler when you have abundant majority class data and want to reduce computational cost while balancing classes. Use RandomOverSampler when minority class data is precious and you want to emphasize it through repetition. Use RandomSampler for general-purpose rebalancing with the sampling_rate controlling overall data usage (< 1 for less data, > 1 for more training). Set desired_dist to target class proportions (must sum to 1). These methods work best with significant class imbalance. For binary classification with 10% minority class, try desired_dist={False: 0.5, True: 0.5} or {False: 0.4, True: 0.6} depending on how much to favor the minority class.

Code Reference

Source Location

Repository: Online_ml_River
File: river/imblearn/random.py

Signature

class RandomUnderSampler(
    classifier: base.Classifier,
    desired_dist: dict,
    seed: int | None = None,
)

class RandomOverSampler(
    classifier: base.Classifier,
    desired_dist: dict,
    seed: int | None = None,
)

class RandomSampler(
    classifier: base.Classifier,
    desired_dist: dict,
    sampling_rate=1.0,
    seed: int | None = None,
)

Import

from river import imblearn

I/O Contract

Input
Parameter	Type	Description
x	dict	Feature dictionary
y	Any	Class label (any hashable type)

Output
Method	Return Type	Description
predict_one(x)	Any	Delegates to wrapped classifier
predict_proba_one(x)	dict	Delegates to wrapped classifier
learn_one(x, y)	None	Selective/repeated training based on class

Usage Examples

from river import datasets
from river import evaluate
from river import imblearn
from river import linear_model
from river import metrics
from river import preprocessing

# Under-sampling example
model = imblearn.RandomUnderSampler(
    (
        preprocessing.StandardScaler() |
        linear_model.LogisticRegression()
    ),
    desired_dist={False: 0.4, True: 0.6},
    seed=42
)

dataset = datasets.CreditCard().take(3000)
metric = metrics.LogLoss()
result = evaluate.progressive_val_score(dataset, model, metric)
print(result)  # LogLoss: 0.0336...

# Over-sampling example
model = imblearn.RandomOverSampler(
    (
        preprocessing.StandardScaler() |
        linear_model.LogisticRegression()
    ),
    desired_dist={False: 0.4, True: 0.6},
    seed=42
)

result = evaluate.progressive_val_score(dataset, model, metric)
print(result)  # LogLoss: 0.0457...

# Mixed sampling example
model = imblearn.RandomSampler(
    (
        preprocessing.StandardScaler() |
        linear_model.LogisticRegression()
    ),
    desired_dist={False: 0.4, True: 0.6},
    sampling_rate=0.8,
    seed=42
)

result = evaluate.progressive_val_score(dataset, model, metric)
print(result)  # LogLoss: 0.09...

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment