Implementation:Online ml River Cluster Centers Inspection
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs | Online Clustering, Model Inspection, Concept Drift | 2026-02-08 16:00 GMT |
Overview
Concrete documentation of the pattern for accessing and inspecting cluster model state -- including centroids, micro-cluster collections, and weights -- to monitor how cluster structures evolve over time in River's online clustering algorithms.
Description
This is a Pattern Doc that documents how to inspect the evolving internal state of River's clustering algorithms. Each algorithm exposes different attributes that reveal the current cluster structure:
- KMeans exposes
model.centers-- adict[int, defaultdict]mapping cluster IDs to centroid positions. Each centroid is adefaultdictthat lazily initializes unseen feature dimensions. - DBSTREAM exposes
model.micro_clusters(raw micro-clusters with centers and weights),model.clusters(macro-clusters after reclustering), andmodel.centers(macro-cluster centers). The shared density graphmodel.stracks inter-micro-cluster density relationships. - DenStream exposes
model.p_micro_clusters(potential/core micro-clusters) andmodel.o_micro_clusters(outlier micro-clusters). Each micro-cluster has attributes for linear sum, squared sum, count, creation time, and last edit time.
By reading these attributes at regular intervals during the learn/predict loop, users can build a complete picture of how the cluster structure changes over time.
Usage
Reference this pattern when you need to build monitoring, visualization, or drift-detection logic around River's online clustering models. Access the attributes directly on the model instance after calling learn_one or predict_one.
Code Reference
Source Locations
| Algorithm | Source | Key Attribute |
|---|---|---|
| KMeans | river/cluster/k_means.py:L99-L101 |
centers: dict[int, defaultdict] -- initialized with Gaussian random values per feature.
|
| DBSTREAM | river/cluster/dbstream.py:L135-L160 |
_micro_clusters: dict[int, DBSTREAMMicroCluster] -- each has .center, .weight, .last_update. Also exposes properties: micro_clusters, clusters, centers, n_clusters.
|
| DenStream | river/cluster/denstream.py:L142-L179 |
p_micro_clusters: dict[int, DenStreamMicroCluster] and o_micro_clusters: dict[int, DenStreamMicroCluster]. Each has .linear_sum, .squared_sum, .N, .creation_time, .last_edit_time. centers property computes fading-weighted centers.
|
Import
from river import cluster
I/O Contract
Inputs
No specific inputs -- state inspection is done by reading model attributes after learn_one / predict_one calls.
Outputs
| Attribute | Type | Description |
|---|---|---|
| KMeans.centers | dict[int, defaultdict] |
Maps cluster_id to centroid position. Each centroid is a defaultdict of feature values. |
| DBSTREAM.micro_clusters | dict[int, DBSTREAMMicroCluster] |
Raw micro-clusters with .center (dict), .weight (float), .last_update (int).
|
| DBSTREAM.centers | dict |
Macro-cluster centers after reclustering. |
| DBSTREAM.n_clusters | int |
Number of macro-clusters. |
| DenStream.p_micro_clusters | dict[int, DenStreamMicroCluster] |
Potential (core) micro-clusters. |
| DenStream.o_micro_clusters | dict[int, DenStreamMicroCluster] |
Outlier micro-clusters. |
| DenStream.centers | dict |
Centers of the final macro-clusters (property, computed on access). |
Usage Examples
Inspecting KMeans centers over time:
from river import cluster, stream
k_means = cluster.KMeans(n_clusters=3, halflife=0.5, seed=42)
X = [
[1, 2], [1, 4], [1, 0],
[-4, 2], [-4, 4], [-4, 0],
[5, 0], [5, 2], [5, 4]
]
for i, (x, _) in enumerate(stream.iter_array(X)):
k_means.learn_one(x)
# Inspect centers after each observation
print(f'After point {i}: centers = {{')
for cid, center in k_means.centers.items():
print(f' {cid}: {dict(center)}')
print('}')
Monitoring DBSTREAM micro-cluster count:
from river import cluster, stream
dbstream = cluster.DBSTREAM(
clustering_threshold=1.5,
fading_factor=0.05,
cleanup_interval=4,
minimum_weight=1
)
X = [
[1, 0.5], [1, 0.625], [1, 0.75], [1, 1.125],
[4, 1.5], [4, 2.25], [4, 2.5], [4, 3],
]
for i, (x, _) in enumerate(stream.iter_array(X)):
dbstream.learn_one(x)
n_micro = len(dbstream.micro_clusters)
print(f'Step {i}: {n_micro} micro-clusters')
Tracking DenStream potential vs outlier micro-clusters:
from river import cluster, stream
denstream = cluster.DenStream(
decaying_factor=0.01,
beta=0.5,
mu=2.5,
epsilon=0.5,
n_samples_init=10
)
X = [
[-1, -0.5], [-1, -0.625], [-1, -0.75], [-1, -1],
[-1, -1.125], [-1, -1.25], [-1.5, -0.5], [-1.5, -0.625],
[-1.5, -0.75], [-1.5, -1], [-1.5, -1.125], [-1.5, -1.25],
[1, 1.5], [1, 1.75], [1, 2],
]
for i, (x, _) in enumerate(stream.iter_array(X)):
denstream.learn_one(x)
n_p = len(denstream.p_micro_clusters)
n_o = len(denstream.o_micro_clusters)
print(f'Step {i}: {n_p} potential, {n_o} outlier micro-clusters')