Implementation:Online ml River Bandit LinUCBDisjoint
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Contextual_Bandits, Linear_Models, Bayesian_Methods |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
A contextual bandit algorithm that uses Bayesian linear regression for each arm to compute upper confidence bounds based on context features.
Description
LinUCBDisjoint implements the disjoint variant of the Linear Upper Confidence Bound algorithm. Each arm maintains its own BayesianLinearRegression model that learns to predict rewards based on context features. When selecting an arm, the algorithm computes the upper confidence bound as mu + sigma from the posterior distribution, where mu is the predicted mean and sigma is the uncertainty. The arm with the highest upper bound is selected. This approach naturally balances exploration (high uncertainty) and exploitation (high predicted reward).
Usage
Use LinUCBDisjoint for contextual bandit problems with linear reward relationships. It's particularly effective when context features provide valuable information for arm selection, such as in personalized recommendation systems. Note that the current implementation may be slow for large-scale applications.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/bandit/lin_ucb.py
Signature
class LinUCBDisjoint(bandit.base.ContextualPolicy):
def __init__(
self,
alpha: float = 1.0,
beta: float = 1.0,
smoothing: float | None = None,
reward_obj=None,
burn_in=0,
seed: int | None = None,
):
...
Import
from river import bandit
I/O Contract
| Parameter | Type | Description |
|---|---|---|
| alpha | float (default: 1.0) | Prior precision parameter for Bayesian linear regression |
| beta | float (default: 1.0) | Noise precision parameter for Bayesian linear regression |
| smoothing | float (optional) | Smoothing parameter for Bayesian linear regression |
| reward_obj | RewardObj (optional) | Reward statistic |
| burn_in | int (default: 0) | Minimum pulls per arm before using UCB |
| seed | int (optional) | Random seed for reproducibility |
Usage Examples
from river import bandit
# Load contextual bandit dataset
dataset = bandit.datasets.NewsArticles()
# Initialize LinUCB policy
policy = bandit.LinUCBDisjoint(
alpha=1.0,
beta=1.0,
seed=42
)
# Simulate contextual bandit scenario
for context, true_arm, true_reward in dataset:
# Get available arms
arms = dataset.arms
# Select arm based on context
chosen_arm = policy.pull(arms, context=context)
# In practice, only observe reward for chosen arm
if chosen_arm == true_arm:
policy.update(chosen_arm, context, true_reward)
# Break after 1000 iterations
if policy._n > 1000:
break
print(policy.ranking)