Implementation:Online ml River Anomaly HalfSpaceTrees
| Knowledge Sources | River River Docs Fast Anomaly Detection for Streaming Data |
|---|---|
| Domains | Online Machine Learning, Anomaly Detection, Ensemble Methods |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Concrete tool for performing online anomaly detection using Half-Space Trees in the River library, implementing an ensemble of randomly partitioned trees with a dual-window mass estimation scheme.
Description
The anomaly.HalfSpaceTrees class implements the Half-Space Trees algorithm for streaming anomaly detection. It builds an ensemble of n_trees random binary trees, each of height height, that partition the feature space using random axis-aligned splits. The algorithm maintains working and reference mass counters at each node and pivots them every window_size observations.
By default, the implementation assumes features are bounded in [0, 1]. If features have different ranges, you can either specify explicit limits or prepend a preprocessing.MinMaxScaler in a pipeline.
Trees are lazily constructed on the first call to learn_one, so the first call may be slower than subsequent ones. Time complexity for both learn_one and score_one scales linearly with the number of trees and exponentially with the height of each tree.
The class inherits from anomaly.base.AnomalyDetector.
Usage
Import and use anomaly.HalfSpaceTrees when:
- You need an online anomaly detector that works on streaming data
- You want normalized anomaly scores in [0, 1]
- You prefer an ensemble method with tunable complexity
- Your features are bounded in [0, 1] or you can apply min-max scaling upstream
Code Reference
Source Location
river/anomaly/hst.py, lines 94-286.
Signature
class HalfSpaceTrees(anomaly.base.AnomalyDetector):
def __init__(
self,
n_trees=10,
height=8,
window_size=250,
limits: dict[str, tuple[float, float]] | None = None,
seed: int | None = None,
):
Import
from river import anomaly
model = anomaly.HalfSpaceTrees(n_trees=10, height=8, window_size=250, seed=42)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| n_trees | int | 10 | Number of trees in the ensemble. |
| height | int | 8 | Height of each tree. A tree of height h has 2^(h+1) - 1 nodes. |
| window_size | int | 250 | Number of observations per window. After this many observations, the working window is pivoted to the reference window. |
| limits | dict or None | None | Specifies the range of each feature as {feature_name: (min, max)}. Defaults to [0, 1] for all features. |
| seed | int or None | None | Random number seed for reproducibility. |
Methods
learn_one(x: dict) -> None-- Updates mass counters in each tree and pivots windows when the window is full.score_one(x: dict) -> float-- Returns an anomaly score in [0, 1] where high values indicate anomalies.
I/O Contract
Inputs
| Method | Parameter | Type | Description |
|---|---|---|---|
| learn_one | x | dict | A dictionary mapping feature names to numeric values. Features should be in [0, 1] unless custom limits are provided. |
| score_one | x | dict | A dictionary mapping feature names to numeric values. |
Outputs
| Method | Return Type | Description |
|---|---|---|
| learn_one | None | Updates internal tree mass counters; no return value. |
| score_one | float | Anomaly score in [0, 1]. High scores indicate anomalies; low scores indicate normal observations. Returns 0 during the first window (before any reference data is available). |
Usage Examples
Basic anomaly scoring:
from river import anomaly
X = [0.5, 0.45, 0.43, 0.44, 0.445, 0.45, 0.0]
hst = anomaly.HalfSpaceTrees(
n_trees=5,
height=3,
window_size=3,
seed=42
)
# Warm up the model
for x in X[:3]:
hst.learn_one({'x': x})
# Score observations
for x in X:
features = {'x': x}
hst.learn_one(features)
print(f'Anomaly score for x={x:.3f}: {hst.score_one(features):.3f}')
# Anomaly score for x=0.500: 0.107
# Anomaly score for x=0.450: 0.071
# Anomaly score for x=0.430: 0.107
# Anomaly score for x=0.440: 0.107
# Anomaly score for x=0.445: 0.107
# Anomaly score for x=0.450: 0.071
# Anomaly score for x=0.000: 0.853
Pipeline with MinMaxScaler on CreditCard dataset:
from river import compose, preprocessing, anomaly, datasets, metrics
model = compose.Pipeline(
preprocessing.MinMaxScaler(),
anomaly.HalfSpaceTrees(seed=42)
)
auc = metrics.ROCAUC()
for x, y in datasets.CreditCard().take(2500):
score = model.score_one(x)
model.learn_one(x)
auc.update(y, score)
print(auc)
# ROCAUC: 91.15%
Using progressive_val_score for evaluation:
from river import compose, preprocessing, anomaly, datasets, metrics, evaluate
model = compose.Pipeline(
preprocessing.MinMaxScaler(),
anomaly.HalfSpaceTrees(seed=42)
)
evaluate.progressive_val_score(
dataset=datasets.CreditCard().take(2500),
model=model,
metric=metrics.ROCAUC(),
print_every=1000
)
# [1,000] ROCAUC: 88.43%
# [2,000] ROCAUC: 89.28%
# [2,500] ROCAUC: 91.15%
# ROCAUC: 91.15%
Related Pages
- Principle:Online_ml_River_Half_Space_Trees_Anomaly_Detection
- Implementation:Online_ml_River_Preprocessing_MinMaxScaler
- Implementation:Online_ml_River_Anomaly_ThresholdFilter
- Implementation:Online_ml_River_Anomaly_QuantileFilter
- Environment:Online_ml_River_Python_Runtime_Environment
- Heuristic:Online_ml_River_HST_Feature_Scaling_Requirement