Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Bandit LinUCBDisjoint

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Contextual_Bandits, Linear_Models, Bayesian_Methods
Last Updated 2026-02-08 16:00 GMT

Overview

A contextual bandit algorithm that uses Bayesian linear regression for each arm to compute upper confidence bounds based on context features.

Description

LinUCBDisjoint implements the disjoint variant of the Linear Upper Confidence Bound algorithm. Each arm maintains its own BayesianLinearRegression model that learns to predict rewards based on context features. When selecting an arm, the algorithm computes the upper confidence bound as mu + sigma from the posterior distribution, where mu is the predicted mean and sigma is the uncertainty. The arm with the highest upper bound is selected. This approach naturally balances exploration (high uncertainty) and exploitation (high predicted reward).

Usage

Use LinUCBDisjoint for contextual bandit problems with linear reward relationships. It's particularly effective when context features provide valuable information for arm selection, such as in personalized recommendation systems. Note that the current implementation may be slow for large-scale applications.

Code Reference

Source Location

Signature

class LinUCBDisjoint(bandit.base.ContextualPolicy):
    def __init__(
        self,
        alpha: float = 1.0,
        beta: float = 1.0,
        smoothing: float | None = None,
        reward_obj=None,
        burn_in=0,
        seed: int | None = None,
    ):
        ...

Import

from river import bandit

I/O Contract

Parameter Type Description
alpha float (default: 1.0) Prior precision parameter for Bayesian linear regression
beta float (default: 1.0) Noise precision parameter for Bayesian linear regression
smoothing float (optional) Smoothing parameter for Bayesian linear regression
reward_obj RewardObj (optional) Reward statistic
burn_in int (default: 0) Minimum pulls per arm before using UCB
seed int (optional) Random seed for reproducibility

Usage Examples

from river import bandit

# Load contextual bandit dataset
dataset = bandit.datasets.NewsArticles()

# Initialize LinUCB policy
policy = bandit.LinUCBDisjoint(
    alpha=1.0,
    beta=1.0,
    seed=42
)

# Simulate contextual bandit scenario
for context, true_arm, true_reward in dataset:
    # Get available arms
    arms = dataset.arms

    # Select arm based on context
    chosen_arm = policy.pull(arms, context=context)

    # In practice, only observe reward for chosen arm
    if chosen_arm == true_arm:
        policy.update(chosen_arm, context, true_reward)

    # Break after 1000 iterations
    if policy._n > 1000:
        break

print(policy.ranking)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment