Implementation:Online ml River Clusterer Learn Predict
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs | Online Clustering, API Design, Abstract Interfaces | 2026-02-08 16:00 GMT |
Overview
Concrete documentation of the base.Clusterer abstract base class that defines the learn_one/predict_one interface all River clustering implementations must follow.
Description
The base.Clusterer class is the abstract foundation for every clustering algorithm in River. It inherits from base.Estimator and declares two abstract methods: learn_one(x) for incremental model updates and predict_one(x) for cluster assignment. By marking _supervised = False, it signals to the framework that clustering is an unsupervised task.
All clustering implementations in River -- including cluster.KMeans, cluster.DBSTREAM, cluster.DenStream, cluster.CluStream, cluster.STREAMKMeans, and cluster.TextClust -- subclass base.Clusterer and provide concrete implementations of these methods.
This is a Pattern Doc that documents the base class interface rather than a specific algorithm.
Usage
Reference the base.Clusterer interface when implementing a new clustering algorithm, when writing generic code that works with any River clusterer, or when understanding the type hierarchy of River's clustering module.
Code Reference
Source Location
river/base/clusterer.py:L9-L41
Signature
class Clusterer(estimator.Estimator):
"""A clustering model."""
@property
def _supervised(self) -> bool:
return False
@abc.abstractmethod
def learn_one(self, x: dict[typing.FeatureName, Any]) -> None:
"""Update the model with a set of features x."""
@abc.abstractmethod
def predict_one(self, x: dict[typing.FeatureName, Any]) -> int:
"""Predicts the cluster number for a set of features x."""
Import
from river import base
Implementations
All of the following classes inherit from base.Clusterer:
| Class | Module | Description |
|---|---|---|
| KMeans | river.cluster |
Incremental K-Means with exponential moving average updates. |
| DBSTREAM | river.cluster |
Density-based clustering with shared density graph. |
| DenStream | river.cluster |
Density-based clustering with potential/outlier micro-clusters. |
| CluStream | river.cluster |
Temporal micro-cluster framework with periodic K-Means macro-clustering. |
| STREAMKMeans | river.cluster |
Chunk-based streaming K-Means. |
| TextClust | river.cluster |
TF-IDF-based text stream clustering. |
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| x | dict[FeatureName, Any] |
A dictionary mapping feature names (strings or integers) to feature values. This is the universal input format for all River clusterers. |
Outputs
| Output | Type | Description |
|---|---|---|
| learn_one return | None |
The method updates internal state in-place and returns nothing. |
| predict_one return | int |
An integer cluster label. The specific range depends on the algorithm. |
Usage Examples
Generic clustering loop using the Clusterer interface:
from river import cluster, stream, metrics
# Any Clusterer subclass works here
model = cluster.KMeans(n_clusters=3, seed=42)
metric = metrics.Silhouette()
X = [
[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0],
[-2, 2], [-2, 4], [-2, 0]
]
for x, _ in stream.iter_array(X):
model.learn_one(x)
y_pred = model.predict_one(x)
metric.update(x, y_pred, model.centers)
print(metric)
Swapping algorithms with the same interface:
from river import cluster, stream
# These can be used interchangeably in the same loop
algorithms = [
cluster.KMeans(n_clusters=3, seed=0),
cluster.DBSTREAM(clustering_threshold=1.5),
cluster.STREAMKMeans(chunk_size=5, n_clusters=3, seed=0),
]
X = [[1, 2], [1, 4], [-4, 2], [-4, 4], [5, 0], [5, 2]]
for algo in algorithms:
for x, _ in stream.iter_array(X):
algo.learn_one(x)
print(f'{algo.__class__.__name__}: cluster for [1,2] = {algo.predict_one({0: 1, 1: 2})}')