Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Cluster DenStream

From Leeroopedia


Knowledge Sources Domains Last Updated
River River Docs Density-Based Clustering over an Evolving Data Stream with Noise (Cao et al., 2006) Online Clustering, Density-Based Clustering 2026-02-08 16:00 GMT

Overview

Concrete tool for performing DenStream density-based clustering on evolving data streams, maintaining potential and outlier micro-clusters with exponential decay and producing final clusters via offline DBSCAN.

Description

The cluster.DenStream class implements the DenStream algorithm. It maintains two collections of micro-clusters: p_micro_clusters (potential, representing genuine cluster regions) and o_micro_clusters (outlier, representing noise or emerging clusters). Each micro-cluster tracks its count, linear sum, squared sum, and timestamps, enabling computation of weight, center, and radius with exponential decay.

The class has an initialization phase that buffers n_samples_init points before applying an initial DBSCAN to seed the potential micro-clusters. After initialization, each new point is merged into the nearest suitable micro-cluster, or a new outlier micro-cluster is created. Periodic pruning removes decayed micro-clusters. On prediction, a DBSCAN variant on p-micro-cluster centers produces the final clustering.

Usage

Import cluster.DenStream when you need density-based online clustering that explicitly handles noise through the potential/outlier micro-cluster distinction. It is particularly useful for streams where clusters have varying densities and noise points are common.

Code Reference

Source Location

river/cluster/denstream.py:L11-L392

Signature

class DenStream(base.Clusterer):
    def __init__(
        self,
        decaying_factor: float = 0.25,
        beta: float = 0.75,
        mu: float = 2,
        epsilon: float = 0.02,
        n_samples_init: int = 1000,
        stream_speed: int = 100
    )

Import

from river import cluster

Key Parameters

Parameter Default Description
decaying_factor 0.25 Controls the exponential decay rate of micro-cluster weights. Must be nonzero.
beta 0.75 Outlier threshold multiplier. Must be in the range (0, 1].
mu 2 Core micro-cluster weight threshold. Must satisfy mu > 1/beta.
epsilon 0.02 Neighborhood radius -- maximum radius for a micro-cluster to accept a new point.
n_samples_init 1000 Number of points buffered for initial DBSCAN before online phase begins.
stream_speed 100 Number of points per unit time step; controls how frequently the timestamp increments.

Methods

Method Signature Description
learn_one learn_one(x: dict, w=None) -> None Buffers during initialization; after initialization, merges x into the nearest micro-cluster or creates a new outlier micro-cluster. Triggers periodic pruning.
predict_one predict_one(x: dict, w=None) -> int Applies DBSCAN on p-micro-cluster centers to form macro-clusters and returns the cluster assignment for x. Returns 0 if the model is not yet initialized.

Key Attributes

Attribute Type Description
n_clusters int Number of final clusters after applying DBSCAN on p-micro-clusters.
clusters dict[int, DenStreamMicroCluster] Final macro-clusters after the offline DBSCAN phase.
p_micro_clusters dict[int, DenStreamMicroCluster] Current potential (core) micro-clusters.
o_micro_clusters dict[int, DenStreamMicroCluster] Current outlier micro-clusters.
centers dict (property) Centers of the final macro-clusters, computed via fading-weighted means.

I/O Contract

Inputs

Parameter Type Description
x dict A dictionary mapping feature names to numeric values.

Outputs

Output Type Description
predict_one return int The cluster index assigned to the observation. Returns 0 before initialization completes.

Usage Examples

from river import cluster
from river import stream

X = [
    [-1, -0.5], [-1, -0.625], [-1, -0.75], [-1, -1], [-1, -1.125],
    [-1, -1.25], [-1.5, -0.5], [-1.5, -0.625], [-1.5, -0.75], [-1.5, -1],
    [-1.5, -1.125], [-1.5, -1.25], [1, 1.5], [1, 1.75], [1, 2],
    [4, 1.25], [4, 1.5], [4, 2.25], [4, 2.5], [4, 3],
    [4, 3.25], [4, 3.5], [4, 3.75], [4, 4],
]

denstream = cluster.DenStream(
    decaying_factor=0.01,
    beta=0.5,
    mu=2.5,
    epsilon=0.5,
    n_samples_init=10
)

for x, _ in stream.iter_array(X):
    denstream.learn_one(x)

denstream.predict_one({0: -1, 1: -2})
# 1

denstream.predict_one({0: 5, 1: 4})
# 2

denstream.predict_one({0: 1, 1: 1})
# 0

denstream.n_clusters
# 3

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment