Implementation:Online ml River Bandit LinUCBDisjoint

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Contextual_Bandits, Linear_Models, Bayesian_Methods
Last Updated	2026-02-08 16:00 GMT

Overview

A contextual bandit algorithm that uses Bayesian linear regression for each arm to compute upper confidence bounds based on context features.

Description

LinUCBDisjoint implements the disjoint variant of the Linear Upper Confidence Bound algorithm. Each arm maintains its own BayesianLinearRegression model that learns to predict rewards based on context features. When selecting an arm, the algorithm computes the upper confidence bound as mu + sigma from the posterior distribution, where mu is the predicted mean and sigma is the uncertainty. The arm with the highest upper bound is selected. This approach naturally balances exploration (high uncertainty) and exploitation (high predicted reward).

Usage

Use LinUCBDisjoint for contextual bandit problems with linear reward relationships. It's particularly effective when context features provide valuable information for arm selection, such as in personalized recommendation systems. Note that the current implementation may be slow for large-scale applications.

Code Reference

Source Location

Repository: Online_ml_River
File: river/bandit/lin_ucb.py

Signature

class LinUCBDisjoint(bandit.base.ContextualPolicy):
    def __init__(
        self,
        alpha: float = 1.0,
        beta: float = 1.0,
        smoothing: float | None = None,
        reward_obj=None,
        burn_in=0,
        seed: int | None = None,
    ):
        ...

Import

from river import bandit

I/O Contract

Parameter	Type	Description
alpha	float (default: 1.0)	Prior precision parameter for Bayesian linear regression
beta	float (default: 1.0)	Noise precision parameter for Bayesian linear regression
smoothing	float (optional)	Smoothing parameter for Bayesian linear regression
reward_obj	RewardObj (optional)	Reward statistic
burn_in	int (default: 0)	Minimum pulls per arm before using UCB
seed	int (optional)	Random seed for reproducibility

Usage Examples

from river import bandit

# Load contextual bandit dataset
dataset = bandit.datasets.NewsArticles()

# Initialize LinUCB policy
policy = bandit.LinUCBDisjoint(
    alpha=1.0,
    beta=1.0,
    seed=42
)

# Simulate contextual bandit scenario
for context, true_arm, true_reward in dataset:
    # Get available arms
    arms = dataset.arms

    # Select arm based on context
    chosen_arm = policy.pull(arms, context=context)

    # In practice, only observe reward for chosen arm
    if chosen_arm == true_arm:
        policy.update(chosen_arm, context, true_reward)

    # Break after 1000 iterations
    if policy._n > 1000:
        break

print(policy.ranking)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment