Implementation:Online ml River Cluster Centers Inspection

Knowledge Sources	Domains	Last Updated
River River Docs	Online Clustering, Model Inspection, Concept Drift	2026-02-08 16:00 GMT

Overview

Concrete documentation of the pattern for accessing and inspecting cluster model state -- including centroids, micro-cluster collections, and weights -- to monitor how cluster structures evolve over time in River's online clustering algorithms.

Description

This is a Pattern Doc that documents how to inspect the evolving internal state of River's clustering algorithms. Each algorithm exposes different attributes that reveal the current cluster structure:

KMeans exposes model.centers -- a dict[int, defaultdict] mapping cluster IDs to centroid positions. Each centroid is a defaultdict that lazily initializes unseen feature dimensions.
DBSTREAM exposes model.micro_clusters (raw micro-clusters with centers and weights), model.clusters (macro-clusters after reclustering), and model.centers (macro-cluster centers). The shared density graph model.s tracks inter-micro-cluster density relationships.
DenStream exposes model.p_micro_clusters (potential/core micro-clusters) and model.o_micro_clusters (outlier micro-clusters). Each micro-cluster has attributes for linear sum, squared sum, count, creation time, and last edit time.

By reading these attributes at regular intervals during the learn/predict loop, users can build a complete picture of how the cluster structure changes over time.

Usage

Reference this pattern when you need to build monitoring, visualization, or drift-detection logic around River's online clustering models. Access the attributes directly on the model instance after calling learn_one or predict_one.

Code Reference

Source Locations

Algorithm	Source	Key Attribute
KMeans	`river/cluster/k_means.py:L99-L101`	`centers: dict[int, defaultdict]` -- initialized with Gaussian random values per feature.
DBSTREAM	`river/cluster/dbstream.py:L135-L160`	`_micro_clusters: dict[int, DBSTREAMMicroCluster]` -- each has `.center`, `.weight`, `.last_update`. Also exposes properties: `micro_clusters`, `clusters`, `centers`, `n_clusters`.
DenStream	`river/cluster/denstream.py:L142-L179`	`p_micro_clusters: dict[int, DenStreamMicroCluster]` and `o_micro_clusters: dict[int, DenStreamMicroCluster]`. Each has `.linear_sum`, `.squared_sum`, `.N`, `.creation_time`, `.last_edit_time`. `centers` property computes fading-weighted centers.

Import

from river import cluster

I/O Contract

Inputs

No specific inputs -- state inspection is done by reading model attributes after learn_one / predict_one calls.

Outputs

Attribute	Type	Description
KMeans.centers	`dict[int, defaultdict]`	Maps cluster_id to centroid position. Each centroid is a defaultdict of feature values.
DBSTREAM.micro_clusters	`dict[int, DBSTREAMMicroCluster]`	Raw micro-clusters with `.center` (dict), `.weight` (float), `.last_update` (int).
DBSTREAM.centers	`dict`	Macro-cluster centers after reclustering.
DBSTREAM.n_clusters	`int`	Number of macro-clusters.
DenStream.p_micro_clusters	`dict[int, DenStreamMicroCluster]`	Potential (core) micro-clusters.
DenStream.o_micro_clusters	`dict[int, DenStreamMicroCluster]`	Outlier micro-clusters.
DenStream.centers	`dict`	Centers of the final macro-clusters (property, computed on access).

Usage Examples

Inspecting KMeans centers over time:

from river import cluster, stream

k_means = cluster.KMeans(n_clusters=3, halflife=0.5, seed=42)

X = [
    [1, 2], [1, 4], [1, 0],
    [-4, 2], [-4, 4], [-4, 0],
    [5, 0], [5, 2], [5, 4]
]

for i, (x, _) in enumerate(stream.iter_array(X)):
    k_means.learn_one(x)

    # Inspect centers after each observation
    print(f'After point {i}: centers = {{')
    for cid, center in k_means.centers.items():
        print(f'  {cid}: {dict(center)}')
    print('}')

Monitoring DBSTREAM micro-cluster count:

from river import cluster, stream

dbstream = cluster.DBSTREAM(
    clustering_threshold=1.5,
    fading_factor=0.05,
    cleanup_interval=4,
    minimum_weight=1
)

X = [
    [1, 0.5], [1, 0.625], [1, 0.75], [1, 1.125],
    [4, 1.5], [4, 2.25], [4, 2.5], [4, 3],
]

for i, (x, _) in enumerate(stream.iter_array(X)):
    dbstream.learn_one(x)
    n_micro = len(dbstream.micro_clusters)
    print(f'Step {i}: {n_micro} micro-clusters')

Tracking DenStream potential vs outlier micro-clusters:

from river import cluster, stream

denstream = cluster.DenStream(
    decaying_factor=0.01,
    beta=0.5,
    mu=2.5,
    epsilon=0.5,
    n_samples_init=10
)

X = [
    [-1, -0.5], [-1, -0.625], [-1, -0.75], [-1, -1],
    [-1, -1.125], [-1, -1.25], [-1.5, -0.5], [-1.5, -0.625],
    [-1.5, -0.75], [-1.5, -1], [-1.5, -1.125], [-1.5, -1.25],
    [1, 1.5], [1, 1.75], [1, 2],
]

for i, (x, _) in enumerate(stream.iter_array(X)):
    denstream.learn_one(x)
    n_p = len(denstream.p_micro_clusters)
    n_o = len(denstream.o_micro_clusters)
    print(f'Step {i}: {n_p} potential, {n_o} outlier micro-clusters')

Related Pages

Principle:Online_ml_River_Cluster_Evolution_Monitoring

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment