Implementation:Online ml River ModelSelection Bandit
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Model_Selection, Multi_Armed_Bandits |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Bandit-based model selection treats each candidate model as a bandit arm, using exploration-exploitation strategies to dynamically choose which models to train at each step.
Description
This approach associates each model with an arm in a multi-armed bandit problem. At each learning step, the bandit policy decides which model(s) to update based on their past performance. The policy maintains reward statistics for each model using the provided metric. During burn-in period, all models are pulled at least once to ensure fair initial exploration. After burn-in, the policy balances exploration (trying different models) with exploitation (focusing on the best performing ones). Predictions always use the current best model based on accumulated rewards. The policy can be epsilon-greedy, UCB, Thompson sampling, or other bandit algorithms.
Usage
Use bandit model selection when you want dynamic model selection that adapts as stream characteristics change, rather than committing to one model upfront. It is particularly effective when different models perform better on different segments of the data stream. The epsilon-greedy policy with decay is a good default, starting with more exploration that gradually focuses on exploitation. Set burn_in to ensure each model gets enough initial samples. Bandit selection has lower overhead than successive halving and adapts continuously rather than in discrete elimination rounds.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/model_selection/bandit.py
Signature
class BanditRegressor(
models,
metric: metrics.base.RegressionMetric,
policy: bandit.base.Policy,
)
class BanditClassifier(
models,
metric: metrics.base.ClassificationMetric,
policy: bandit.base.Policy,
)
Import
from river import model_selection
I/O Contract
| Parameter | Type | Description |
|---|---|---|
| models | list | List of model instances to select from |
| x | dict | Feature dictionary |
| y | Any | Target value |
| Method | Return Type | Description |
|---|---|---|
| predict_one(x) | Any | Prediction from best model |
| predict_proba_one(x) | dict | Probabilities from best model (classifier) |
| learn_one(x, y) | None | Updates selected models based on policy |
| best_model | Estimator | Currently best performing model |
Usage Examples
from river import bandit
from river import datasets
from river import evaluate
from river import linear_model
from river import metrics
from river import model_selection
from river import optim
from river import preprocessing
# Create candidate models
models = [
linear_model.LinearRegression(optimizer=optim.SGD(lr=lr))
for lr in [0.0001, 0.001, 1e-05, 0.01]
]
# Apply bandit selection
model = (
preprocessing.StandardScaler() |
model_selection.BanditRegressor(
models,
metric=metrics.MAE(),
policy=bandit.EpsilonGreedy(
epsilon=0.1,
decay=0.001,
burn_in=100,
seed=42
)
)
)
# Evaluate
result = evaluate.progressive_val_score(
datasets.TrumpApproval(),
model,
metrics.MAE()
)
print(result) # MAE: 3.134089