Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Cluster DBSTREAM

From Leeroopedia


Knowledge Sources Domains Last Updated
River River Docs Clustering Data Streams Based on Shared Density between Micro-Clusters (Hahsler and Bolanos, 2016) Online Clustering, Density-Based Clustering 2026-02-08 16:00 GMT

Overview

Concrete tool for performing DBSTREAM density-based clustering on evolving data streams, maintaining micro-clusters with a shared density graph and producing macro-clusters via connected components.

Description

The cluster.DBSTREAM class implements the DBSTREAM algorithm for streaming density-based clustering. It maintains a set of micro-clusters, each defined by a center position, a weight (which fades over time), and a last-update timestamp. A shared density graph tracks the co-occurrence of micro-cluster activations. On prediction, the algorithm reclusters using a DBSCAN variant on the shared density graph to produce macro-clusters.

Key internal state includes micro_clusters (the set of active micro-clusters), a shared density matrix s, and timestamp tracking for both micro-clusters and shared densities. The cleanup process periodically removes weak micro-clusters and weak shared density entries.

Usage

Import cluster.DBSTREAM when you need online density-based clustering that discovers clusters of arbitrary shape and automatically determines the number of clusters. It is suitable for evolving data streams where clusters may appear, disappear, or change shape over time.

Code Reference

Source Location

river/cluster/dbstream.py:L11-L443

Signature

class DBSTREAM(base.Clusterer):
    def __init__(
        self,
        clustering_threshold: float = 1.0,
        fading_factor: float = 0.01,
        cleanup_interval: float = 2,
        intersection_factor: float = 0.3,
        minimum_weight: float = 1.0
    )

Import

from river import cluster

Key Parameters

Parameter Default Description
clustering_threshold 1.0 Radius around each micro-cluster center; a point within this distance joins the micro-cluster.
fading_factor 0.01 Controls the exponential weight decay rate. Must be nonzero.
cleanup_interval 2 Time steps between consecutive cleanup passes that remove weak micro-clusters.
intersection_factor 0.3 Threshold for shared density; determines whether micro-clusters are connected in the density graph.
minimum_weight 1.0 Minimum weight for a micro-cluster to be considered "strong" during reclustering.

Methods

Method Signature Description
learn_one learn_one(x: dict, w=None) -> None Updates micro-clusters with observation x; triggers cleanup if at the scheduled interval.
predict_one predict_one(x: dict, w=None) -> int Triggers reclustering if needed and returns the macro-cluster assignment for x.

Key Attributes

Attribute Type Description
n_clusters int Number of macro-clusters generated after reclustering.
clusters dict[int, DBSTREAMMicroCluster] Final macro-clusters (merged micro-clusters with same label).
centers dict Centers of the final macro-clusters.
micro_clusters dict[int, DBSTREAMMicroCluster] Current set of micro-clusters maintained by the online phase.

I/O Contract

Inputs

Parameter Type Description
x dict A dictionary mapping feature names to numeric values representing one observation.

Outputs

Output Type Description
predict_one return int The macro-cluster index assigned to the observation.

Usage Examples

from river import cluster
from river import stream

X = [
    [1, 0.5], [1, 0.625], [1, 0.75], [1, 1.125], [1, 1.5], [1, 1.75],
    [4, 1.5], [4, 2.25], [4, 2.5], [4, 3], [4, 3.25], [4, 3.5]
]

dbstream = cluster.DBSTREAM(
    clustering_threshold=1.5,
    fading_factor=0.05,
    cleanup_interval=4,
    intersection_factor=0.5,
    minimum_weight=1
)

for x, _ in stream.iter_array(X):
    dbstream.learn_one(x)

dbstream.predict_one({0: 1, 1: 2})
# 0

dbstream.predict_one({0: 5, 1: 2})
# 1

dbstream.n_clusters
# 2

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment