Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Online ml River Anomaly HalfSpaceTrees

From Leeroopedia


Knowledge Sources River River Docs Fast Anomaly Detection for Streaming Data
Domains Online Machine Learning, Anomaly Detection, Ensemble Methods
Last Updated 2026-02-08 16:00 GMT

Overview

Concrete tool for performing online anomaly detection using Half-Space Trees in the River library, implementing an ensemble of randomly partitioned trees with a dual-window mass estimation scheme.

Description

The anomaly.HalfSpaceTrees class implements the Half-Space Trees algorithm for streaming anomaly detection. It builds an ensemble of n_trees random binary trees, each of height height, that partition the feature space using random axis-aligned splits. The algorithm maintains working and reference mass counters at each node and pivots them every window_size observations.

By default, the implementation assumes features are bounded in [0, 1]. If features have different ranges, you can either specify explicit limits or prepend a preprocessing.MinMaxScaler in a pipeline.

Trees are lazily constructed on the first call to learn_one, so the first call may be slower than subsequent ones. Time complexity for both learn_one and score_one scales linearly with the number of trees and exponentially with the height of each tree.

The class inherits from anomaly.base.AnomalyDetector.

Usage

Import and use anomaly.HalfSpaceTrees when:

  • You need an online anomaly detector that works on streaming data
  • You want normalized anomaly scores in [0, 1]
  • You prefer an ensemble method with tunable complexity
  • Your features are bounded in [0, 1] or you can apply min-max scaling upstream

Code Reference

Source Location

river/anomaly/hst.py, lines 94-286.

Signature

class HalfSpaceTrees(anomaly.base.AnomalyDetector):
    def __init__(
        self,
        n_trees=10,
        height=8,
        window_size=250,
        limits: dict[str, tuple[float, float]] | None = None,
        seed: int | None = None,
    ):

Import

from river import anomaly
model = anomaly.HalfSpaceTrees(n_trees=10, height=8, window_size=250, seed=42)

Parameters

Parameter Type Default Description
n_trees int 10 Number of trees in the ensemble.
height int 8 Height of each tree. A tree of height h has 2^(h+1) - 1 nodes.
window_size int 250 Number of observations per window. After this many observations, the working window is pivoted to the reference window.
limits dict or None None Specifies the range of each feature as {feature_name: (min, max)}. Defaults to [0, 1] for all features.
seed int or None None Random number seed for reproducibility.

Methods

  • learn_one(x: dict) -> None -- Updates mass counters in each tree and pivots windows when the window is full.
  • score_one(x: dict) -> float -- Returns an anomaly score in [0, 1] where high values indicate anomalies.

I/O Contract

Inputs

Method Parameter Type Description
learn_one x dict A dictionary mapping feature names to numeric values. Features should be in [0, 1] unless custom limits are provided.
score_one x dict A dictionary mapping feature names to numeric values.

Outputs

Method Return Type Description
learn_one None Updates internal tree mass counters; no return value.
score_one float Anomaly score in [0, 1]. High scores indicate anomalies; low scores indicate normal observations. Returns 0 during the first window (before any reference data is available).

Usage Examples

Basic anomaly scoring:

from river import anomaly

X = [0.5, 0.45, 0.43, 0.44, 0.445, 0.45, 0.0]
hst = anomaly.HalfSpaceTrees(
    n_trees=5,
    height=3,
    window_size=3,
    seed=42
)

# Warm up the model
for x in X[:3]:
    hst.learn_one({'x': x})

# Score observations
for x in X:
    features = {'x': x}
    hst.learn_one(features)
    print(f'Anomaly score for x={x:.3f}: {hst.score_one(features):.3f}')
# Anomaly score for x=0.500: 0.107
# Anomaly score for x=0.450: 0.071
# Anomaly score for x=0.430: 0.107
# Anomaly score for x=0.440: 0.107
# Anomaly score for x=0.445: 0.107
# Anomaly score for x=0.450: 0.071
# Anomaly score for x=0.000: 0.853

Pipeline with MinMaxScaler on CreditCard dataset:

from river import compose, preprocessing, anomaly, datasets, metrics

model = compose.Pipeline(
    preprocessing.MinMaxScaler(),
    anomaly.HalfSpaceTrees(seed=42)
)

auc = metrics.ROCAUC()

for x, y in datasets.CreditCard().take(2500):
    score = model.score_one(x)
    model.learn_one(x)
    auc.update(y, score)

print(auc)
# ROCAUC: 91.15%

Using progressive_val_score for evaluation:

from river import compose, preprocessing, anomaly, datasets, metrics, evaluate

model = compose.Pipeline(
    preprocessing.MinMaxScaler(),
    anomaly.HalfSpaceTrees(seed=42)
)

evaluate.progressive_val_score(
    dataset=datasets.CreditCard().take(2500),
    model=model,
    metric=metrics.ROCAUC(),
    print_every=1000
)
# [1,000] ROCAUC: 88.43%
# [2,000] ROCAUC: 89.28%
# [2,500] ROCAUC: 91.15%
# ROCAUC: 91.15%

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment