Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Base Clusterer

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Clustering, Base_Classes
Last Updated 2026-02-08 16:00 GMT

Overview

The Clusterer class is an abstract base class that defines the interface for all clustering algorithms in River.

Description

The Clusterer class extends Estimator to provide the standard interface for unsupervised clustering models in River. It defines two abstract methods that all clustering algorithms must implement: learn_one for updating the model with a single unlabeled example (features only, no target), and predict_one for assigning a cluster number to a given set of features. The _supervised property returns False, indicating that clustering is an unsupervised learning task that does not require target labels during training.

Usage

Use Clusterer as the parent class when implementing new online clustering algorithms that learn from individual examples without supervision. All clusterers must implement both learn_one and predict_one methods. Cluster numbers are typically integers starting from 0, though the specific numbering scheme depends on the algorithm implementation.

Code Reference

Source Location

Signature

class Clusterer(estimator.Estimator):
    """A clustering model."""

    @property
    def _supervised(self) -> bool

    @abc.abstractmethod
    def learn_one(self, x: dict[typing.FeatureName, Any]) -> None

    @abc.abstractmethod
    def predict_one(self, x: dict[typing.FeatureName, Any]) -> int

Import

from river.base import Clusterer

I/O Contract

learn_one

Parameter Type Description
x dict[FeatureName, Any] Dictionary of features to learn from (no target label)

predict_one

Parameter Type Description
x dict[FeatureName, Any] Dictionary of features to cluster
Returns Type Description
cluster_id int The assigned cluster number for the input features

Properties

Property Type Description
_supervised bool Always returns False for clustering (unsupervised learning)

Usage Examples

from river import cluster
from river import stream
import random

# Create a clusterer
model = cluster.KMeans(n_clusters=3, seed=42)

# Generate some synthetic data
random.seed(42)
X = [
    {'x': random.gauss(0, 1), 'y': random.gauss(0, 1)}
    for _ in range(100)
]

# Online clustering
for x in X:
    # Predict cluster assignment
    cluster_id = model.predict_one(x)

    # Update the model
    model.learn_one(x)

    print(f"Point {x} assigned to cluster {cluster_id}")

# Implementing a custom clusterer
from river.base import Clusterer

class SimpleCentroidClusterer(Clusterer):
    def __init__(self, n_clusters=2):
        self.n_clusters = n_clusters
        self.centroids = {}
        self.counts = {}
        self.next_id = 0

    def learn_one(self, x):
        # Update nearest centroid
        cluster_id = self.predict_one(x)

        if cluster_id not in self.centroids:
            self.centroids[cluster_id] = x.copy()
            self.counts[cluster_id] = 1
        else:
            # Update centroid (running mean)
            n = self.counts[cluster_id]
            for key, value in x.items():
                old_val = self.centroids[cluster_id].get(key, 0)
                self.centroids[cluster_id][key] = (old_val * n + value) / (n + 1)
            self.counts[cluster_id] += 1

    def predict_one(self, x):
        # Assign to nearest centroid
        if not self.centroids:
            # First point creates first cluster
            if self.next_id < self.n_clusters:
                cluster_id = self.next_id
                self.next_id += 1
                return cluster_id
            return 0

        # Find nearest centroid
        min_dist = float('inf')
        best_cluster = 0
        for cluster_id, centroid in self.centroids.items():
            dist = sum((x.get(k, 0) - v) ** 2 for k, v in centroid.items())
            if dist < min_dist:
                min_dist = dist
                best_cluster = cluster_id

        return best_cluster

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment