Principle:Online ml River Streaming Anomaly Datasets
| Knowledge Sources | River River Docs |
|---|---|
| Domains | Online Machine Learning, Anomaly Detection, Benchmarking, Datasets |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Built-in benchmark datasets for evaluating online anomaly detection algorithms on labeled streaming data, providing standardized benchmarks with known anomaly proportions for reproducible evaluation.
Description
Streaming Anomaly Datasets are pre-packaged, labeled datasets included in River's datasets module that serve as standard benchmarks for evaluating anomaly detection algorithms. Each dataset provides a stream of (features, label) pairs, where the label indicates whether each observation is normal (0) or anomalous (1).
These datasets are critical for:
- Reproducible evaluation: Standardized benchmarks allow direct comparison of different anomaly detection approaches.
- Realistic class imbalance: Real-world anomaly detection problems are highly imbalanced, and these datasets reflect that -- anomalies typically represent a tiny fraction of all observations.
- Streaming compatibility: Datasets are designed to be iterated one observation at a time, matching River's online learning paradigm.
River provides two primary anomaly detection benchmark datasets:
CreditCard: A fraud detection dataset containing 284,807 credit card transactions from European cardholders over two days (September 2013). Only 492 transactions (0.172%) are fraudulent. Features are PCA-transformed (V1-V28) plus Time and Amount, totaling 30 features.
HTTP: An intrusion detection dataset from the KDD 1999 cup containing 567,498 HTTP connections. Only 2,211 (0.39%) are anomalous. It has 3 numeric features (duration, src_bytes, dst_bytes).
Both datasets inherit from base.RemoteDataset, meaning they are downloaded on first use and cached locally.
Usage
Use streaming anomaly datasets when:
- You need to benchmark an anomaly detection algorithm
- You want to compare different detector configurations or algorithms
- You need labeled data for computing ROCAUC or other classification metrics
- You are prototyping an anomaly detection pipeline and need representative data
- You want to reproduce results from River's documentation or papers
Theoretical Basis
Dataset characteristics:
| Dataset | Samples | Features | Anomaly % | Domain | Source |
|---|---|---|---|---|---|
| CreditCard | 284,807 | 30 | 0.172% (492 frauds) | Fraud Detection | ULB Machine Learning Group |
| HTTP | 567,498 | 3 | 0.39% (2,211 anomalies) | Intrusion Detection | KDD Cup 1999 |
Evaluation protocol for anomaly detection:
Since anomaly detectors in River are unsupervised (they only see features during training, not labels), the labels in these datasets are used exclusively for evaluation, not for training.
for x, y in dataset:
score = model.score_one(x) # Predict (unsupervised)
metric.update(y, score) # Evaluate against ground truth
model.learn_one(x) # Learn (unsupervised, no y)
Class imbalance considerations:
- Standard accuracy is misleading (a model that always predicts "normal" achieves >99% accuracy)
- ROCAUC is the recommended metric: it evaluates the quality of the score ranking regardless of threshold
- ClassificationReport (precision, recall, F1) is useful when evaluating with a specific filter threshold
Streaming iteration:
Both datasets support the .take(n) method to limit the number of observations, useful for quick experiments:
dataset = CreditCard().take(2500) # Only first 2,500 observations