Implementation:Online ml River Cluster DBSTREAM
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs Clustering Data Streams Based on Shared Density between Micro-Clusters (Hahsler and Bolanos, 2016) | Online Clustering, Density-Based Clustering | 2026-02-08 16:00 GMT |
Overview
Concrete tool for performing DBSTREAM density-based clustering on evolving data streams, maintaining micro-clusters with a shared density graph and producing macro-clusters via connected components.
Description
The cluster.DBSTREAM class implements the DBSTREAM algorithm for streaming density-based clustering. It maintains a set of micro-clusters, each defined by a center position, a weight (which fades over time), and a last-update timestamp. A shared density graph tracks the co-occurrence of micro-cluster activations. On prediction, the algorithm reclusters using a DBSCAN variant on the shared density graph to produce macro-clusters.
Key internal state includes micro_clusters (the set of active micro-clusters), a shared density matrix s, and timestamp tracking for both micro-clusters and shared densities. The cleanup process periodically removes weak micro-clusters and weak shared density entries.
Usage
Import cluster.DBSTREAM when you need online density-based clustering that discovers clusters of arbitrary shape and automatically determines the number of clusters. It is suitable for evolving data streams where clusters may appear, disappear, or change shape over time.
Code Reference
Source Location
river/cluster/dbstream.py:L11-L443
Signature
class DBSTREAM(base.Clusterer):
def __init__(
self,
clustering_threshold: float = 1.0,
fading_factor: float = 0.01,
cleanup_interval: float = 2,
intersection_factor: float = 0.3,
minimum_weight: float = 1.0
)
Import
from river import cluster
Key Parameters
| Parameter | Default | Description |
|---|---|---|
| clustering_threshold | 1.0 | Radius around each micro-cluster center; a point within this distance joins the micro-cluster. |
| fading_factor | 0.01 | Controls the exponential weight decay rate. Must be nonzero. |
| cleanup_interval | 2 | Time steps between consecutive cleanup passes that remove weak micro-clusters. |
| intersection_factor | 0.3 | Threshold for shared density; determines whether micro-clusters are connected in the density graph. |
| minimum_weight | 1.0 | Minimum weight for a micro-cluster to be considered "strong" during reclustering. |
Methods
| Method | Signature | Description |
|---|---|---|
| learn_one | learn_one(x: dict, w=None) -> None |
Updates micro-clusters with observation x; triggers cleanup if at the scheduled interval. |
| predict_one | predict_one(x: dict, w=None) -> int |
Triggers reclustering if needed and returns the macro-cluster assignment for x. |
Key Attributes
| Attribute | Type | Description |
|---|---|---|
| n_clusters | int |
Number of macro-clusters generated after reclustering. |
| clusters | dict[int, DBSTREAMMicroCluster] |
Final macro-clusters (merged micro-clusters with same label). |
| centers | dict |
Centers of the final macro-clusters. |
| micro_clusters | dict[int, DBSTREAMMicroCluster] |
Current set of micro-clusters maintained by the online phase. |
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| x | dict |
A dictionary mapping feature names to numeric values representing one observation. |
Outputs
| Output | Type | Description |
|---|---|---|
| predict_one return | int |
The macro-cluster index assigned to the observation. |
Usage Examples
from river import cluster
from river import stream
X = [
[1, 0.5], [1, 0.625], [1, 0.75], [1, 1.125], [1, 1.5], [1, 1.75],
[4, 1.5], [4, 2.25], [4, 2.5], [4, 3], [4, 3.25], [4, 3.5]
]
dbstream = cluster.DBSTREAM(
clustering_threshold=1.5,
fading_factor=0.05,
cleanup_interval=4,
intersection_factor=0.5,
minimum_weight=1
)
for x, _ in stream.iter_array(X):
dbstream.learn_one(x)
dbstream.predict_one({0: 1, 1: 2})
# 0
dbstream.predict_one({0: 5, 1: 2})
# 1
dbstream.n_clusters
# 2