Implementation:Online ml River Clusterer Learn Predict

Knowledge Sources	Domains	Last Updated
River River Docs	Online Clustering, API Design, Abstract Interfaces	2026-02-08 16:00 GMT

Overview

Concrete documentation of the base.Clusterer abstract base class that defines the learn_one/predict_one interface all River clustering implementations must follow.

Description

The base.Clusterer class is the abstract foundation for every clustering algorithm in River. It inherits from base.Estimator and declares two abstract methods: learn_one(x) for incremental model updates and predict_one(x) for cluster assignment. By marking _supervised = False, it signals to the framework that clustering is an unsupervised task.

All clustering implementations in River -- including cluster.KMeans, cluster.DBSTREAM, cluster.DenStream, cluster.CluStream, cluster.STREAMKMeans, and cluster.TextClust -- subclass base.Clusterer and provide concrete implementations of these methods.

This is a Pattern Doc that documents the base class interface rather than a specific algorithm.

Usage

Reference the base.Clusterer interface when implementing a new clustering algorithm, when writing generic code that works with any River clusterer, or when understanding the type hierarchy of River's clustering module.

Code Reference

Source Location

river/base/clusterer.py:L9-L41

Signature

class Clusterer(estimator.Estimator):
    """A clustering model."""

    @property
    def _supervised(self) -> bool:
        return False

    @abc.abstractmethod
    def learn_one(self, x: dict[typing.FeatureName, Any]) -> None:
        """Update the model with a set of features x."""

    @abc.abstractmethod
    def predict_one(self, x: dict[typing.FeatureName, Any]) -> int:
        """Predicts the cluster number for a set of features x."""

Import

from river import base

Implementations

All of the following classes inherit from base.Clusterer:

Class	Module	Description
KMeans	`river.cluster`	Incremental K-Means with exponential moving average updates.
DBSTREAM	`river.cluster`	Density-based clustering with shared density graph.
DenStream	`river.cluster`	Density-based clustering with potential/outlier micro-clusters.
CluStream	`river.cluster`	Temporal micro-cluster framework with periodic K-Means macro-clustering.
STREAMKMeans	`river.cluster`	Chunk-based streaming K-Means.
TextClust	`river.cluster`	TF-IDF-based text stream clustering.

I/O Contract

Inputs

Parameter	Type	Description
x	`dict[FeatureName, Any]`	A dictionary mapping feature names (strings or integers) to feature values. This is the universal input format for all River clusterers.

Outputs

Output	Type	Description
learn_one return	`None`	The method updates internal state in-place and returns nothing.
predict_one return	`int`	An integer cluster label. The specific range depends on the algorithm.

Usage Examples

Generic clustering loop using the Clusterer interface:

from river import cluster, stream, metrics

# Any Clusterer subclass works here
model = cluster.KMeans(n_clusters=3, seed=42)
metric = metrics.Silhouette()

X = [
    [1, 2], [1, 4], [1, 0],
    [4, 2], [4, 4], [4, 0],
    [-2, 2], [-2, 4], [-2, 0]
]

for x, _ in stream.iter_array(X):
    model.learn_one(x)
    y_pred = model.predict_one(x)
    metric.update(x, y_pred, model.centers)

print(metric)

Swapping algorithms with the same interface:

from river import cluster, stream

# These can be used interchangeably in the same loop
algorithms = [
    cluster.KMeans(n_clusters=3, seed=0),
    cluster.DBSTREAM(clustering_threshold=1.5),
    cluster.STREAMKMeans(chunk_size=5, n_clusters=3, seed=0),
]

X = [[1, 2], [1, 4], [-4, 2], [-4, 4], [5, 0], [5, 2]]

for algo in algorithms:
    for x, _ in stream.iter_array(X):
        algo.learn_one(x)
    print(f'{algo.__class__.__name__}: cluster for [1,2] = {algo.predict_one({0: 1, 1: 2})}')

Related Pages

Principle:Online_ml_River_Incremental_Clustering_Interface

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment