Implementation:Online ml River Cluster DenStream
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs Density-Based Clustering over an Evolving Data Stream with Noise (Cao et al., 2006) | Online Clustering, Density-Based Clustering | 2026-02-08 16:00 GMT |
Overview
Concrete tool for performing DenStream density-based clustering on evolving data streams, maintaining potential and outlier micro-clusters with exponential decay and producing final clusters via offline DBSCAN.
Description
The cluster.DenStream class implements the DenStream algorithm. It maintains two collections of micro-clusters: p_micro_clusters (potential, representing genuine cluster regions) and o_micro_clusters (outlier, representing noise or emerging clusters). Each micro-cluster tracks its count, linear sum, squared sum, and timestamps, enabling computation of weight, center, and radius with exponential decay.
The class has an initialization phase that buffers n_samples_init points before applying an initial DBSCAN to seed the potential micro-clusters. After initialization, each new point is merged into the nearest suitable micro-cluster, or a new outlier micro-cluster is created. Periodic pruning removes decayed micro-clusters. On prediction, a DBSCAN variant on p-micro-cluster centers produces the final clustering.
Usage
Import cluster.DenStream when you need density-based online clustering that explicitly handles noise through the potential/outlier micro-cluster distinction. It is particularly useful for streams where clusters have varying densities and noise points are common.
Code Reference
Source Location
river/cluster/denstream.py:L11-L392
Signature
class DenStream(base.Clusterer):
def __init__(
self,
decaying_factor: float = 0.25,
beta: float = 0.75,
mu: float = 2,
epsilon: float = 0.02,
n_samples_init: int = 1000,
stream_speed: int = 100
)
Import
from river import cluster
Key Parameters
| Parameter | Default | Description |
|---|---|---|
| decaying_factor | 0.25 | Controls the exponential decay rate of micro-cluster weights. Must be nonzero. |
| beta | 0.75 | Outlier threshold multiplier. Must be in the range (0, 1]. |
| mu | 2 | Core micro-cluster weight threshold. Must satisfy mu > 1/beta.
|
| epsilon | 0.02 | Neighborhood radius -- maximum radius for a micro-cluster to accept a new point. |
| n_samples_init | 1000 | Number of points buffered for initial DBSCAN before online phase begins. |
| stream_speed | 100 | Number of points per unit time step; controls how frequently the timestamp increments. |
Methods
| Method | Signature | Description |
|---|---|---|
| learn_one | learn_one(x: dict, w=None) -> None |
Buffers during initialization; after initialization, merges x into the nearest micro-cluster or creates a new outlier micro-cluster. Triggers periodic pruning. |
| predict_one | predict_one(x: dict, w=None) -> int |
Applies DBSCAN on p-micro-cluster centers to form macro-clusters and returns the cluster assignment for x. Returns 0 if the model is not yet initialized. |
Key Attributes
| Attribute | Type | Description |
|---|---|---|
| n_clusters | int |
Number of final clusters after applying DBSCAN on p-micro-clusters. |
| clusters | dict[int, DenStreamMicroCluster] |
Final macro-clusters after the offline DBSCAN phase. |
| p_micro_clusters | dict[int, DenStreamMicroCluster] |
Current potential (core) micro-clusters. |
| o_micro_clusters | dict[int, DenStreamMicroCluster] |
Current outlier micro-clusters. |
| centers | dict (property) |
Centers of the final macro-clusters, computed via fading-weighted means. |
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| x | dict |
A dictionary mapping feature names to numeric values. |
Outputs
| Output | Type | Description |
|---|---|---|
| predict_one return | int |
The cluster index assigned to the observation. Returns 0 before initialization completes. |
Usage Examples
from river import cluster
from river import stream
X = [
[-1, -0.5], [-1, -0.625], [-1, -0.75], [-1, -1], [-1, -1.125],
[-1, -1.25], [-1.5, -0.5], [-1.5, -0.625], [-1.5, -0.75], [-1.5, -1],
[-1.5, -1.125], [-1.5, -1.25], [1, 1.5], [1, 1.75], [1, 2],
[4, 1.25], [4, 1.5], [4, 2.25], [4, 2.5], [4, 3],
[4, 3.25], [4, 3.5], [4, 3.75], [4, 4],
]
denstream = cluster.DenStream(
decaying_factor=0.01,
beta=0.5,
mu=2.5,
epsilon=0.5,
n_samples_init=10
)
for x, _ in stream.iter_array(X):
denstream.learn_one(x)
denstream.predict_one({0: -1, 1: -2})
# 1
denstream.predict_one({0: 5, 1: 4})
# 2
denstream.predict_one({0: 1, 1: 1})
# 0
denstream.n_clusters
# 3