Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River ModelSelection Bandit

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Model_Selection, Multi_Armed_Bandits
Last Updated 2026-02-08 16:00 GMT

Overview

Bandit-based model selection treats each candidate model as a bandit arm, using exploration-exploitation strategies to dynamically choose which models to train at each step.

Description

This approach associates each model with an arm in a multi-armed bandit problem. At each learning step, the bandit policy decides which model(s) to update based on their past performance. The policy maintains reward statistics for each model using the provided metric. During burn-in period, all models are pulled at least once to ensure fair initial exploration. After burn-in, the policy balances exploration (trying different models) with exploitation (focusing on the best performing ones). Predictions always use the current best model based on accumulated rewards. The policy can be epsilon-greedy, UCB, Thompson sampling, or other bandit algorithms.

Usage

Use bandit model selection when you want dynamic model selection that adapts as stream characteristics change, rather than committing to one model upfront. It is particularly effective when different models perform better on different segments of the data stream. The epsilon-greedy policy with decay is a good default, starting with more exploration that gradually focuses on exploitation. Set burn_in to ensure each model gets enough initial samples. Bandit selection has lower overhead than successive halving and adapts continuously rather than in discrete elimination rounds.

Code Reference

Source Location

Signature

class BanditRegressor(
    models,
    metric: metrics.base.RegressionMetric,
    policy: bandit.base.Policy,
)

class BanditClassifier(
    models,
    metric: metrics.base.ClassificationMetric,
    policy: bandit.base.Policy,
)

Import

from river import model_selection

I/O Contract

Input
Parameter Type Description
models list List of model instances to select from
x dict Feature dictionary
y Any Target value
Output
Method Return Type Description
predict_one(x) Any Prediction from best model
predict_proba_one(x) dict Probabilities from best model (classifier)
learn_one(x, y) None Updates selected models based on policy
best_model Estimator Currently best performing model

Usage Examples

from river import bandit
from river import datasets
from river import evaluate
from river import linear_model
from river import metrics
from river import model_selection
from river import optim
from river import preprocessing

# Create candidate models
models = [
    linear_model.LinearRegression(optimizer=optim.SGD(lr=lr))
    for lr in [0.0001, 0.001, 1e-05, 0.01]
]

# Apply bandit selection
model = (
    preprocessing.StandardScaler() |
    model_selection.BanditRegressor(
        models,
        metric=metrics.MAE(),
        policy=bandit.EpsilonGreedy(
            epsilon=0.1,
            decay=0.001,
            burn_in=100,
            seed=42
        )
    )
)

# Evaluate
result = evaluate.progressive_val_score(
    datasets.TrumpApproval(),
    model,
    metrics.MAE()
)
print(result)  # MAE: 3.134089

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment