Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Cluster Centers Inspection

From Leeroopedia


Knowledge Sources Domains Last Updated
River River Docs Online Clustering, Model Inspection, Concept Drift 2026-02-08 16:00 GMT

Overview

Concrete documentation of the pattern for accessing and inspecting cluster model state -- including centroids, micro-cluster collections, and weights -- to monitor how cluster structures evolve over time in River's online clustering algorithms.

Description

This is a Pattern Doc that documents how to inspect the evolving internal state of River's clustering algorithms. Each algorithm exposes different attributes that reveal the current cluster structure:

  • KMeans exposes model.centers -- a dict[int, defaultdict] mapping cluster IDs to centroid positions. Each centroid is a defaultdict that lazily initializes unseen feature dimensions.
  • DBSTREAM exposes model.micro_clusters (raw micro-clusters with centers and weights), model.clusters (macro-clusters after reclustering), and model.centers (macro-cluster centers). The shared density graph model.s tracks inter-micro-cluster density relationships.
  • DenStream exposes model.p_micro_clusters (potential/core micro-clusters) and model.o_micro_clusters (outlier micro-clusters). Each micro-cluster has attributes for linear sum, squared sum, count, creation time, and last edit time.

By reading these attributes at regular intervals during the learn/predict loop, users can build a complete picture of how the cluster structure changes over time.

Usage

Reference this pattern when you need to build monitoring, visualization, or drift-detection logic around River's online clustering models. Access the attributes directly on the model instance after calling learn_one or predict_one.

Code Reference

Source Locations

Algorithm Source Key Attribute
KMeans river/cluster/k_means.py:L99-L101 centers: dict[int, defaultdict] -- initialized with Gaussian random values per feature.
DBSTREAM river/cluster/dbstream.py:L135-L160 _micro_clusters: dict[int, DBSTREAMMicroCluster] -- each has .center, .weight, .last_update. Also exposes properties: micro_clusters, clusters, centers, n_clusters.
DenStream river/cluster/denstream.py:L142-L179 p_micro_clusters: dict[int, DenStreamMicroCluster] and o_micro_clusters: dict[int, DenStreamMicroCluster]. Each has .linear_sum, .squared_sum, .N, .creation_time, .last_edit_time. centers property computes fading-weighted centers.

Import

from river import cluster

I/O Contract

Inputs

No specific inputs -- state inspection is done by reading model attributes after learn_one / predict_one calls.

Outputs

Attribute Type Description
KMeans.centers dict[int, defaultdict] Maps cluster_id to centroid position. Each centroid is a defaultdict of feature values.
DBSTREAM.micro_clusters dict[int, DBSTREAMMicroCluster] Raw micro-clusters with .center (dict), .weight (float), .last_update (int).
DBSTREAM.centers dict Macro-cluster centers after reclustering.
DBSTREAM.n_clusters int Number of macro-clusters.
DenStream.p_micro_clusters dict[int, DenStreamMicroCluster] Potential (core) micro-clusters.
DenStream.o_micro_clusters dict[int, DenStreamMicroCluster] Outlier micro-clusters.
DenStream.centers dict Centers of the final macro-clusters (property, computed on access).

Usage Examples

Inspecting KMeans centers over time:

from river import cluster, stream

k_means = cluster.KMeans(n_clusters=3, halflife=0.5, seed=42)

X = [
    [1, 2], [1, 4], [1, 0],
    [-4, 2], [-4, 4], [-4, 0],
    [5, 0], [5, 2], [5, 4]
]

for i, (x, _) in enumerate(stream.iter_array(X)):
    k_means.learn_one(x)

    # Inspect centers after each observation
    print(f'After point {i}: centers = {{')
    for cid, center in k_means.centers.items():
        print(f'  {cid}: {dict(center)}')
    print('}')

Monitoring DBSTREAM micro-cluster count:

from river import cluster, stream

dbstream = cluster.DBSTREAM(
    clustering_threshold=1.5,
    fading_factor=0.05,
    cleanup_interval=4,
    minimum_weight=1
)

X = [
    [1, 0.5], [1, 0.625], [1, 0.75], [1, 1.125],
    [4, 1.5], [4, 2.25], [4, 2.5], [4, 3],
]

for i, (x, _) in enumerate(stream.iter_array(X)):
    dbstream.learn_one(x)
    n_micro = len(dbstream.micro_clusters)
    print(f'Step {i}: {n_micro} micro-clusters')

Tracking DenStream potential vs outlier micro-clusters:

from river import cluster, stream

denstream = cluster.DenStream(
    decaying_factor=0.01,
    beta=0.5,
    mu=2.5,
    epsilon=0.5,
    n_samples_init=10
)

X = [
    [-1, -0.5], [-1, -0.625], [-1, -0.75], [-1, -1],
    [-1, -1.125], [-1, -1.25], [-1.5, -0.5], [-1.5, -0.625],
    [-1.5, -0.75], [-1.5, -1], [-1.5, -1.125], [-1.5, -1.25],
    [1, 1.5], [1, 1.75], [1, 2],
]

for i, (x, _) in enumerate(stream.iter_array(X)):
    denstream.learn_one(x)
    n_p = len(denstream.p_micro_clusters)
    n_o = len(denstream.o_micro_clusters)
    print(f'Step {i}: {n_p} potential, {n_o} outlier micro-clusters')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment