Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Imblearn RandomSampler

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Imbalanced_Learning, Classification, Sampling
Last Updated 2026-02-08 16:00 GMT

Overview

Random sampling methods (under-sampling, over-sampling, and mixed) that adjust training class distributions to match desired targets using online rejection and acceptance sampling.

Description

These wrappers modify the class distribution seen by the underlying classifier. RandomUnderSampler uses rejection sampling to selectively skip majority class instances. RandomOverSampler uses Poisson sampling to train multiple times on minority class instances. RandomSampler combines both approaches with a sampling_rate parameter. All methods maintain running counts of actual class distribution and desired distribution. They dynamically identify a pivot class and compute acceptance/repetition rates relative to it. The implementations handle distribution shifts by recalculating the pivot when needed. Under-sampling discards instances randomly to match desired ratios; over-sampling replicates instances; mixed sampling does both.

Usage

Use RandomUnderSampler when you have abundant majority class data and want to reduce computational cost while balancing classes. Use RandomOverSampler when minority class data is precious and you want to emphasize it through repetition. Use RandomSampler for general-purpose rebalancing with the sampling_rate controlling overall data usage (< 1 for less data, > 1 for more training). Set desired_dist to target class proportions (must sum to 1). These methods work best with significant class imbalance. For binary classification with 10% minority class, try desired_dist={False: 0.5, True: 0.5} or {False: 0.4, True: 0.6} depending on how much to favor the minority class.

Code Reference

Source Location

Signature

class RandomUnderSampler(
    classifier: base.Classifier,
    desired_dist: dict,
    seed: int | None = None,
)

class RandomOverSampler(
    classifier: base.Classifier,
    desired_dist: dict,
    seed: int | None = None,
)

class RandomSampler(
    classifier: base.Classifier,
    desired_dist: dict,
    sampling_rate=1.0,
    seed: int | None = None,
)

Import

from river import imblearn

I/O Contract

Input
Parameter Type Description
x dict Feature dictionary
y Any Class label (any hashable type)
Output
Method Return Type Description
predict_one(x) Any Delegates to wrapped classifier
predict_proba_one(x) dict Delegates to wrapped classifier
learn_one(x, y) None Selective/repeated training based on class

Usage Examples

from river import datasets
from river import evaluate
from river import imblearn
from river import linear_model
from river import metrics
from river import preprocessing

# Under-sampling example
model = imblearn.RandomUnderSampler(
    (
        preprocessing.StandardScaler() |
        linear_model.LogisticRegression()
    ),
    desired_dist={False: 0.4, True: 0.6},
    seed=42
)

dataset = datasets.CreditCard().take(3000)
metric = metrics.LogLoss()
result = evaluate.progressive_val_score(dataset, model, metric)
print(result)  # LogLoss: 0.0336...

# Over-sampling example
model = imblearn.RandomOverSampler(
    (
        preprocessing.StandardScaler() |
        linear_model.LogisticRegression()
    ),
    desired_dist={False: 0.4, True: 0.6},
    seed=42
)

result = evaluate.progressive_val_score(dataset, model, metric)
print(result)  # LogLoss: 0.0457...

# Mixed sampling example
model = imblearn.RandomSampler(
    (
        preprocessing.StandardScaler() |
        linear_model.LogisticRegression()
    ),
    desired_dist={False: 0.4, True: 0.6},
    sampling_rate=0.8,
    seed=42
)

result = evaluate.progressive_val_score(dataset, model, metric)
print(result)  # LogLoss: 0.09...

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment