Overview
The Clusterer class is an abstract base class that defines the interface for all clustering algorithms in River.
Description
The Clusterer class extends Estimator to provide the standard interface for unsupervised clustering models in River. It defines two abstract methods that all clustering algorithms must implement: learn_one for updating the model with a single unlabeled example (features only, no target), and predict_one for assigning a cluster number to a given set of features. The _supervised property returns False, indicating that clustering is an unsupervised learning task that does not require target labels during training.
Usage
Use Clusterer as the parent class when implementing new online clustering algorithms that learn from individual examples without supervision. All clusterers must implement both learn_one and predict_one methods. Cluster numbers are typically integers starting from 0, though the specific numbering scheme depends on the algorithm implementation.
Code Reference
Source Location
Signature
class Clusterer(estimator.Estimator):
"""A clustering model."""
@property
def _supervised(self) -> bool
@abc.abstractmethod
def learn_one(self, x: dict[typing.FeatureName, Any]) -> None
@abc.abstractmethod
def predict_one(self, x: dict[typing.FeatureName, Any]) -> int
Import
from river.base import Clusterer
I/O Contract
learn_one
| Parameter |
Type |
Description
|
| x |
dict[FeatureName, Any] |
Dictionary of features to learn from (no target label)
|
predict_one
| Parameter |
Type |
Description
|
| x |
dict[FeatureName, Any] |
Dictionary of features to cluster
|
| Returns |
Type |
Description
|
| cluster_id |
int |
The assigned cluster number for the input features
|
Properties
| Property |
Type |
Description
|
| _supervised |
bool |
Always returns False for clustering (unsupervised learning)
|
Usage Examples
from river import cluster
from river import stream
import random
# Create a clusterer
model = cluster.KMeans(n_clusters=3, seed=42)
# Generate some synthetic data
random.seed(42)
X = [
{'x': random.gauss(0, 1), 'y': random.gauss(0, 1)}
for _ in range(100)
]
# Online clustering
for x in X:
# Predict cluster assignment
cluster_id = model.predict_one(x)
# Update the model
model.learn_one(x)
print(f"Point {x} assigned to cluster {cluster_id}")
# Implementing a custom clusterer
from river.base import Clusterer
class SimpleCentroidClusterer(Clusterer):
def __init__(self, n_clusters=2):
self.n_clusters = n_clusters
self.centroids = {}
self.counts = {}
self.next_id = 0
def learn_one(self, x):
# Update nearest centroid
cluster_id = self.predict_one(x)
if cluster_id not in self.centroids:
self.centroids[cluster_id] = x.copy()
self.counts[cluster_id] = 1
else:
# Update centroid (running mean)
n = self.counts[cluster_id]
for key, value in x.items():
old_val = self.centroids[cluster_id].get(key, 0)
self.centroids[cluster_id][key] = (old_val * n + value) / (n + 1)
self.counts[cluster_id] += 1
def predict_one(self, x):
# Assign to nearest centroid
if not self.centroids:
# First point creates first cluster
if self.next_id < self.n_clusters:
cluster_id = self.next_id
self.next_id += 1
return cluster_id
return 0
# Find nearest centroid
min_dist = float('inf')
best_cluster = 0
for cluster_id, centroid in self.centroids.items():
dist = sum((x.get(k, 0) - v) ** 2 for k, v in centroid.items())
if dist < min_dist:
min_dist = dist
best_cluster = cluster_id
return best_cluster
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.