Implementation:Online ml River Ensemble Boosting
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Ensemble_Methods, Boosting |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Online boosting classifiers that adaptively weight training instances using Poisson-distributed resampling for sequential learning.
Description
This module implements three boosting variants: AdaBoostClassifier (basic online boosting), ADWINBoostingClassifier (with drift detection), and BOLEClassifier (optimized ordering). All use Poisson(lambda) resampling where lambda adapts based on model performance. AdaBoost updates lambda based on correctness, increasing for errors and decreasing for successes. ADWINBoosting adds drift detection to replace poorly performing models. BOLE reorders model training, prioritizing worst performers and adjusting lambda based on prediction correctness, creating a "rich get richer" dynamic for better models.
Usage
Use AdaBoostClassifier for basic online boosting with any base classifier. Choose ADWINBoostingClassifier when concept drift is expected, as it automatically detects and adapts to changes. Use BOLEClassifier for imbalanced datasets or when you want more sophisticated model ordering. All three work well with decision trees as base learners and can significantly improve performance over single models.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/ensemble/boosting.py
Signature
class AdaBoostClassifier(base.WrapperEnsemble, base.Classifier):
def __init__(self, model: base.Classifier, n_models=10, seed: int | None = None):
super().__init__(model, n_models, seed)
self.wrong_weight: collections.defaultdict = collections.defaultdict(int)
self.correct_weight: collections.defaultdict = collections.defaultdict(int)
class ADWINBoostingClassifier(AdaBoostClassifier):
def __init__(self, model: base.Classifier, n_models=10, seed: int | None = None):
super().__init__(model, n_models, seed)
self._drift_detectors = [drift.ADWIN() for _ in range(self.n_models)]
class BOLEClassifier(AdaBoostClassifier):
def __init__(
self, model: base.Classifier, n_models=10, seed: int | None = None, error_bound=0.5
):
super().__init__(model=model, n_models=n_models, seed=seed)
self.error_bound = error_bound
self.order_position = [i for i in range(n_models)]
self.instances_seen = 0
Import
from river import ensemble
I/O Contract
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| model | Classifier | required | Base classifier to boost |
| n_models | int | 10 | Number of models in ensemble |
| seed | int or None | None | Random seed for reproducibility |
| error_bound | float | 0.5 | Error threshold for BOLE voting (BOLE only) |
Attributes
| Attribute | Type | Description |
|---|---|---|
| wrong_weight | defaultdict | Cumulative error weights per model |
| correct_weight | defaultdict | Cumulative correct weights per model |
| models | list | Ensemble of base classifiers |
Input/Output
| Method | Input | Output |
|---|---|---|
| learn_one | x: dict, y: Any | None |
| predict_proba_one | x: dict | dict[Any, float] |
Usage Examples
# AdaBoostClassifier
from river import datasets
from river import ensemble
from river import evaluate
from river import metrics
from river import tree
dataset = datasets.Phishing()
metric = metrics.LogLoss()
model = ensemble.AdaBoostClassifier(
model=(
tree.HoeffdingTreeClassifier(
split_criterion='gini',
delta=1e-5,
grace_period=2000
)
),
n_models=5,
seed=42
)
evaluate.progressive_val_score(dataset, model, metric)
# LogLoss: 0.370805
# ADWINBoostingClassifier
from river import preprocessing
dataset = datasets.Phishing()
model = ensemble.ADWINBoostingClassifier(
model=(
preprocessing.StandardScaler() |
linear_model.LogisticRegression()
),
n_models=3,
seed=42
)
metric = metrics.F1()
evaluate.progressive_val_score(dataset, model, metric)
# F1: 87.61%
# BOLEClassifier
from river import drift
dataset = datasets.Elec2().take(3000)
model = ensemble.BOLEClassifier(
model=drift.DriftRetrainingClassifier(
model=tree.HoeffdingTreeClassifier(),
drift_detector=drift.binary.DDM()
),
n_models=10,
seed=42
)
metric = metrics.Accuracy()
evaluate.progressive_val_score(dataset, model, metric)
# Accuracy: 93.63%