Implementation:Online ml River Base Clusterer

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Clustering, Base_Classes
Last Updated	2026-02-08 16:00 GMT

Overview

The Clusterer class is an abstract base class that defines the interface for all clustering algorithms in River.

Description

The Clusterer class extends Estimator to provide the standard interface for unsupervised clustering models in River. It defines two abstract methods that all clustering algorithms must implement: learn_one for updating the model with a single unlabeled example (features only, no target), and predict_one for assigning a cluster number to a given set of features. The _supervised property returns False, indicating that clustering is an unsupervised learning task that does not require target labels during training.

Usage

Use Clusterer as the parent class when implementing new online clustering algorithms that learn from individual examples without supervision. All clusterers must implement both learn_one and predict_one methods. Cluster numbers are typically integers starting from 0, though the specific numbering scheme depends on the algorithm implementation.

Code Reference

Source Location

Repository: Online_ml_River
File: river/base/clusterer.py

Signature

class Clusterer(estimator.Estimator):
    """A clustering model."""

    @property
    def _supervised(self) -> bool

    @abc.abstractmethod
    def learn_one(self, x: dict[typing.FeatureName, Any]) -> None

    @abc.abstractmethod
    def predict_one(self, x: dict[typing.FeatureName, Any]) -> int

Import

from river.base import Clusterer

I/O Contract

learn_one

Parameter	Type	Description
x	dict[FeatureName, Any]	Dictionary of features to learn from (no target label)

predict_one

Parameter	Type	Description
x	dict[FeatureName, Any]	Dictionary of features to cluster

Returns	Type	Description
cluster_id	int	The assigned cluster number for the input features

Properties

Property	Type	Description
_supervised	bool	Always returns False for clustering (unsupervised learning)

Usage Examples

from river import cluster
from river import stream
import random

# Create a clusterer
model = cluster.KMeans(n_clusters=3, seed=42)

# Generate some synthetic data
random.seed(42)
X = [
    {'x': random.gauss(0, 1), 'y': random.gauss(0, 1)}
    for _ in range(100)
]

# Online clustering
for x in X:
    # Predict cluster assignment
    cluster_id = model.predict_one(x)

    # Update the model
    model.learn_one(x)

    print(f"Point {x} assigned to cluster {cluster_id}")

# Implementing a custom clusterer
from river.base import Clusterer

class SimpleCentroidClusterer(Clusterer):
    def __init__(self, n_clusters=2):
        self.n_clusters = n_clusters
        self.centroids = {}
        self.counts = {}
        self.next_id = 0

    def learn_one(self, x):
        # Update nearest centroid
        cluster_id = self.predict_one(x)

        if cluster_id not in self.centroids:
            self.centroids[cluster_id] = x.copy()
            self.counts[cluster_id] = 1
        else:
            # Update centroid (running mean)
            n = self.counts[cluster_id]
            for key, value in x.items():
                old_val = self.centroids[cluster_id].get(key, 0)
                self.centroids[cluster_id][key] = (old_val * n + value) / (n + 1)
            self.counts[cluster_id] += 1

    def predict_one(self, x):
        # Assign to nearest centroid
        if not self.centroids:
            # First point creates first cluster
            if self.next_id < self.n_clusters:
                cluster_id = self.next_id
                self.next_id += 1
                return cluster_id
            return 0

        # Find nearest centroid
        min_dist = float('inf')
        best_cluster = 0
        for cluster_id, centroid in self.centroids.items():
            dist = sum((x.get(k, 0) - v) ** 2 for k, v in centroid.items())
            if dist < min_dist:
                min_dist = dist
                best_cluster = cluster_id

        return best_cluster

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment