Implementation:Online ml River Anomaly HalfSpaceTrees

Knowledge Sources	River River Docs Fast Anomaly Detection for Streaming Data
Domains	Online Machine Learning, Anomaly Detection, Ensemble Methods
Last Updated	2026-02-08 16:00 GMT

Overview

Concrete tool for performing online anomaly detection using Half-Space Trees in the River library, implementing an ensemble of randomly partitioned trees with a dual-window mass estimation scheme.

Description

The anomaly.HalfSpaceTrees class implements the Half-Space Trees algorithm for streaming anomaly detection. It builds an ensemble of n_trees random binary trees, each of height height, that partition the feature space using random axis-aligned splits. The algorithm maintains working and reference mass counters at each node and pivots them every window_size observations.

By default, the implementation assumes features are bounded in [0, 1]. If features have different ranges, you can either specify explicit limits or prepend a preprocessing.MinMaxScaler in a pipeline.

Trees are lazily constructed on the first call to learn_one, so the first call may be slower than subsequent ones. Time complexity for both learn_one and score_one scales linearly with the number of trees and exponentially with the height of each tree.

The class inherits from anomaly.base.AnomalyDetector.

Usage

Import and use anomaly.HalfSpaceTrees when:

You need an online anomaly detector that works on streaming data
You want normalized anomaly scores in [0, 1]
You prefer an ensemble method with tunable complexity
Your features are bounded in [0, 1] or you can apply min-max scaling upstream

Code Reference

Source Location

river/anomaly/hst.py, lines 94-286.

Signature

class HalfSpaceTrees(anomaly.base.AnomalyDetector):
    def __init__(
        self,
        n_trees=10,
        height=8,
        window_size=250,
        limits: dict[str, tuple[float, float]] | None = None,
        seed: int | None = None,
    ):

Import

from river import anomaly
model = anomaly.HalfSpaceTrees(n_trees=10, height=8, window_size=250, seed=42)

Parameters

Parameter	Type	Default	Description
n_trees	int	10	Number of trees in the ensemble.
height	int	8	Height of each tree. A tree of height h has 2^(h+1) - 1 nodes.
window_size	int	250	Number of observations per window. After this many observations, the working window is pivoted to the reference window.
limits	dict or None	None	Specifies the range of each feature as {feature_name: (min, max)}. Defaults to [0, 1] for all features.
seed	int or None	None	Random number seed for reproducibility.

Methods

learn_one(x: dict) -> None -- Updates mass counters in each tree and pivots windows when the window is full.
score_one(x: dict) -> float -- Returns an anomaly score in [0, 1] where high values indicate anomalies.

I/O Contract

Inputs

Method	Parameter	Type	Description
learn_one	x	dict	A dictionary mapping feature names to numeric values. Features should be in [0, 1] unless custom limits are provided.
score_one	x	dict	A dictionary mapping feature names to numeric values.

Outputs

Method	Return Type	Description
learn_one	None	Updates internal tree mass counters; no return value.
score_one	float	Anomaly score in [0, 1]. High scores indicate anomalies; low scores indicate normal observations. Returns 0 during the first window (before any reference data is available).

Usage Examples

Basic anomaly scoring:

from river import anomaly

X = [0.5, 0.45, 0.43, 0.44, 0.445, 0.45, 0.0]
hst = anomaly.HalfSpaceTrees(
    n_trees=5,
    height=3,
    window_size=3,
    seed=42
)

# Warm up the model
for x in X[:3]:
    hst.learn_one({'x': x})

# Score observations
for x in X:
    features = {'x': x}
    hst.learn_one(features)
    print(f'Anomaly score for x={x:.3f}: {hst.score_one(features):.3f}')
# Anomaly score for x=0.500: 0.107
# Anomaly score for x=0.450: 0.071
# Anomaly score for x=0.430: 0.107
# Anomaly score for x=0.440: 0.107
# Anomaly score for x=0.445: 0.107
# Anomaly score for x=0.450: 0.071
# Anomaly score for x=0.000: 0.853

Pipeline with MinMaxScaler on CreditCard dataset:

from river import compose, preprocessing, anomaly, datasets, metrics

model = compose.Pipeline(
    preprocessing.MinMaxScaler(),
    anomaly.HalfSpaceTrees(seed=42)
)

auc = metrics.ROCAUC()

for x, y in datasets.CreditCard().take(2500):
    score = model.score_one(x)
    model.learn_one(x)
    auc.update(y, score)

print(auc)
# ROCAUC: 91.15%

Using progressive_val_score for evaluation:

from river import compose, preprocessing, anomaly, datasets, metrics, evaluate

model = compose.Pipeline(
    preprocessing.MinMaxScaler(),
    anomaly.HalfSpaceTrees(seed=42)
)

evaluate.progressive_val_score(
    dataset=datasets.CreditCard().take(2500),
    model=model,
    metric=metrics.ROCAUC(),
    print_every=1000
)
# [1,000] ROCAUC: 88.43%
# [2,000] ROCAUC: 89.28%
# [2,500] ROCAUC: 91.15%
# ROCAUC: 91.15%

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment