Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Active EntropySampler

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Active_Learning, Uncertainty_Sampling, Classification
Last Updated 2026-02-08 16:00 GMT

Overview

An active learning classifier that selectively requests labels based on prediction entropy, prioritizing uncertain samples.

Description

EntropySampler implements uncertainty sampling using entropy as the uncertainty measure. It computes the normalized entropy of the prediction probability distribution and raises it to a discount_factor power to obtain a selection probability. Higher entropy (more uncertainty) leads to higher probability of requesting a label. The discount factor controls the aggressiveness of sampling: values closer to 0 request more labels, while higher values are more selective. The entropy is normalized by log2(number of non-zero classes) to ensure values in [0,1].

Usage

Use EntropySampler when labels are expensive and you want to focus labeling effort on uncertain predictions. The discount_factor parameter allows you to tune the label request rate. Start with default value 3 and adjust based on your labeling budget and desired model performance.

Code Reference

Source Location

Signature

class EntropySampler(ActiveLearningClassifier):
    def __init__(
        self,
        classifier: base.Classifier,
        discount_factor: float = 3,
        seed=None
    ):
        ...

    def _ask_for_label(self, x, y_pred) -> bool:
        return self._rng.random() < self._p(y_pred)

Import

from river import active

I/O Contract

Parameter Type Description
classifier base.Classifier The classifier to wrap
discount_factor float (default: 3) Controls selection aggressiveness
seed int (optional) Random seed for reproducibility

Usage Examples

from river import active
from river import datasets
from river import feature_extraction
from river import linear_model
from river import metrics

dataset = datasets.SMSSpam()
metric = metrics.Accuracy()

model = (
    feature_extraction.TFIDF(on='body') |
    linear_model.LogisticRegression()
)
model = active.EntropySampler(model, discount_factor=3, seed=42)

n_samples_used = 0
for x, y in dataset:
    y_pred, ask = model.predict_one(x)
    metric.update(y, y_pred)
    if ask:
        n_samples_used += 1
        model.learn_one(x, y)

print(metric)  # Accuracy: 86.60%
print(f"Used {n_samples_used}/{dataset.n_samples} samples")
print(f"Label rate: {n_samples_used / dataset.n_samples:.2%}")
# Output: 34.46%

# More selective sampling (higher discount factor)
model_selective = active.EntropySampler(
    linear_model.LogisticRegression(),
    discount_factor=5,
    seed=42
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment