Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Imblearn HardSampling

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Imbalanced_Learning, Active_Learning, Curriculum_Learning
Last Updated 2026-02-08 16:00 GMT

Overview

Hard Sampling maintains a buffer of difficult-to-predict samples and probabilistically retrains on them, helping models learn from their mistakes.

Description

This method keeps a fixed-size buffer of the hardest samples as measured by prediction loss. When a new sample arrives, its loss is computed and if it exceeds the smallest loss in the buffer (when buffer is full), it replaces that sample. At each step, with probability p the model trains on a random sample from the buffer (updating its loss afterward), and with probability 1-p it trains on the new incoming sample. The buffer is kept sorted by loss for efficient operations. This creates a curriculum where the model frequently revisits challenging examples. Works for both regression (using regression losses like MAE) and classification (using cross-entropy or log loss).

Usage

Use Hard Sampling when your model struggles with certain types of examples and you want it to focus more on learning those patterns. The size parameter controls buffer capacity - larger buffers remember more hard cases but use more memory. The p parameter controls the trade-off between learning from hard samples versus new samples; higher p means more focus on hard cases. This technique is particularly effective for imbalanced problems where the model tends to favor the majority class, or when data contains distinct difficulty levels. Set the loss function to match your problem type and what you want to emphasize.

Code Reference

Source Location

Signature

class HardSamplingRegressor(
    regressor: base.Regressor,
    size: int,
    p: float,
    loss: optim.losses.RegressionLoss | None = None,
    seed: int | None = None,
)

class HardSamplingClassifier(
    classifier: base.Classifier,
    size: int,
    p: float,
    loss: optim.losses.BinaryLoss | optim.losses.MultiClassLoss | None = None,
    seed: int | None = None,
)

Import

from river import imblearn

I/O Contract

Input
Parameter Type Description
x dict Feature dictionary
y Any Target value or class label
Output
Method Return Type Description
predict_one(x) Any Delegates to wrapped model
predict_proba_one(x) dict Class probabilities (classifier only)
learn_one(x, y) None Trains on buffer sample with prob p, new sample with prob 1-p

Usage Examples

from river import datasets
from river import evaluate
from river import imblearn
from river import linear_model
from river import metrics
from river import preprocessing

# Regression example
model = (
    preprocessing.StandardScaler() |
    imblearn.HardSamplingRegressor(
        regressor=linear_model.LinearRegression(),
        p=.2,
        size=30,
        seed=42,
    )
)

result = evaluate.progressive_val_score(
    datasets.TrumpApproval(),
    model,
    metrics.MAE(),
    print_every=500
)
print(result)  # MAE: 1.391246

# Classification example
model = (
    preprocessing.StandardScaler() |
    imblearn.HardSamplingClassifier(
        classifier=linear_model.LogisticRegression(),
        p=0.1,
        size=40,
        seed=42,
    )
)

result = evaluate.progressive_val_score(
    dataset=datasets.Phishing(),
    model=model,
    metric=metrics.ROCAUC(),
    print_every=500,
)
print(result)  # ROCAUC: 95.06%

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment