Principle:Online ml River Streaming Anomaly Datasets

Knowledge Sources	River River Docs
Domains	Online Machine Learning, Anomaly Detection, Benchmarking, Datasets
Last Updated	2026-02-08 16:00 GMT

Overview

Built-in benchmark datasets for evaluating online anomaly detection algorithms on labeled streaming data, providing standardized benchmarks with known anomaly proportions for reproducible evaluation.

Description

Streaming Anomaly Datasets are pre-packaged, labeled datasets included in River's datasets module that serve as standard benchmarks for evaluating anomaly detection algorithms. Each dataset provides a stream of (features, label) pairs, where the label indicates whether each observation is normal (0) or anomalous (1).

These datasets are critical for:

Reproducible evaluation: Standardized benchmarks allow direct comparison of different anomaly detection approaches.
Realistic class imbalance: Real-world anomaly detection problems are highly imbalanced, and these datasets reflect that -- anomalies typically represent a tiny fraction of all observations.
Streaming compatibility: Datasets are designed to be iterated one observation at a time, matching River's online learning paradigm.

River provides two primary anomaly detection benchmark datasets:

CreditCard: A fraud detection dataset containing 284,807 credit card transactions from European cardholders over two days (September 2013). Only 492 transactions (0.172%) are fraudulent. Features are PCA-transformed (V1-V28) plus Time and Amount, totaling 30 features.

HTTP: An intrusion detection dataset from the KDD 1999 cup containing 567,498 HTTP connections. Only 2,211 (0.39%) are anomalous. It has 3 numeric features (duration, src_bytes, dst_bytes).

Both datasets inherit from base.RemoteDataset, meaning they are downloaded on first use and cached locally.

Usage

Use streaming anomaly datasets when:

You need to benchmark an anomaly detection algorithm
You want to compare different detector configurations or algorithms
You need labeled data for computing ROCAUC or other classification metrics
You are prototyping an anomaly detection pipeline and need representative data
You want to reproduce results from River's documentation or papers

Theoretical Basis

Dataset characteristics:

Dataset	Samples	Features	Anomaly %	Domain	Source
CreditCard	284,807	30	0.172% (492 frauds)	Fraud Detection	ULB Machine Learning Group
HTTP	567,498	3	0.39% (2,211 anomalies)	Intrusion Detection	KDD Cup 1999

Evaluation protocol for anomaly detection:

Since anomaly detectors in River are unsupervised (they only see features during training, not labels), the labels in these datasets are used exclusively for evaluation, not for training.

for x, y in dataset:
    score = model.score_one(x)       # Predict (unsupervised)
    metric.update(y, score)           # Evaluate against ground truth
    model.learn_one(x)               # Learn (unsupervised, no y)

Class imbalance considerations:

Standard accuracy is misleading (a model that always predicts "normal" achieves >99% accuracy)
ROCAUC is the recommended metric: it evaluates the quality of the score ranking regardless of threshold
ClassificationReport (precision, recall, F1) is useful when evaluating with a specific filter threshold

Streaming iteration:

Both datasets support the .take(n) method to limit the number of observations, useful for quick experiments:

dataset = CreditCard().take(2500)  # Only first 2,500 observations

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment