Implementation:Online ml River ModelSelection Bandit

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Model_Selection, Multi_Armed_Bandits
Last Updated	2026-02-08 16:00 GMT

Overview

Bandit-based model selection treats each candidate model as a bandit arm, using exploration-exploitation strategies to dynamically choose which models to train at each step.

Description

This approach associates each model with an arm in a multi-armed bandit problem. At each learning step, the bandit policy decides which model(s) to update based on their past performance. The policy maintains reward statistics for each model using the provided metric. During burn-in period, all models are pulled at least once to ensure fair initial exploration. After burn-in, the policy balances exploration (trying different models) with exploitation (focusing on the best performing ones). Predictions always use the current best model based on accumulated rewards. The policy can be epsilon-greedy, UCB, Thompson sampling, or other bandit algorithms.

Usage

Use bandit model selection when you want dynamic model selection that adapts as stream characteristics change, rather than committing to one model upfront. It is particularly effective when different models perform better on different segments of the data stream. The epsilon-greedy policy with decay is a good default, starting with more exploration that gradually focuses on exploitation. Set burn_in to ensure each model gets enough initial samples. Bandit selection has lower overhead than successive halving and adapts continuously rather than in discrete elimination rounds.

Code Reference

Source Location

Repository: Online_ml_River
File: river/model_selection/bandit.py

Signature

class BanditRegressor(
    models,
    metric: metrics.base.RegressionMetric,
    policy: bandit.base.Policy,
)

class BanditClassifier(
    models,
    metric: metrics.base.ClassificationMetric,
    policy: bandit.base.Policy,
)

Import

from river import model_selection

I/O Contract

Input
Parameter	Type	Description
models	list	List of model instances to select from
x	dict	Feature dictionary
y	Any	Target value

Output
Method	Return Type	Description
predict_one(x)	Any	Prediction from best model
predict_proba_one(x)	dict	Probabilities from best model (classifier)
learn_one(x, y)	None	Updates selected models based on policy
best_model	Estimator	Currently best performing model

Usage Examples

from river import bandit
from river import datasets
from river import evaluate
from river import linear_model
from river import metrics
from river import model_selection
from river import optim
from river import preprocessing

# Create candidate models
models = [
    linear_model.LinearRegression(optimizer=optim.SGD(lr=lr))
    for lr in [0.0001, 0.001, 1e-05, 0.01]
]

# Apply bandit selection
model = (
    preprocessing.StandardScaler() |
    model_selection.BanditRegressor(
        models,
        metric=metrics.MAE(),
        policy=bandit.EpsilonGreedy(
            epsilon=0.1,
            decay=0.001,
            burn_in=100,
            seed=42
        )
    )
)

# Evaluate
result = evaluate.progressive_val_score(
    datasets.TrumpApproval(),
    model,
    metrics.MAE()
)
print(result)  # MAE: 3.134089

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment