Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Cluster ODAC

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Clustering, Hierarchical_Clustering, Time_Series
Last Updated 2026-02-08 16:00 GMT

Overview

Online Divisive-Agglomerative Clustering (ODAC) continuously maintains a hierarchical cluster structure from evolving time series data streams.

Description

ODAC is a hierarchical clustering algorithm designed for streaming time series data. It uses a distance metric based on Pearson correlation: rnomc(a, b) = sqrt((1 - corr(a, b)) / 2). The algorithm continuously monitors the evolution of cluster diameters and dynamically splits or merges clusters based on statistical tests using the Hoeffding bound.

The split operator triggers when the difference between the largest distance (diameter) and the second largest distance exceeds a confidence threshold. The merge operator checks if a child cluster's diameter is larger than its parent's diameter, again using the Hoeffding bound to ensure statistical significance.

ODAC only monitors leaf clusters for splitting and merging operations, making it efficient for real-time processing. When the structure changes through split or merge operations, the structure_changed flag is set to true, allowing users to track structural evolution.

Usage

Use ODAC when you need to discover and maintain hierarchical cluster structures in streaming time series data, especially when the number of clusters is unknown and may change over time due to concept drift. It's particularly useful for monitoring systems where the relationships between time series evolve dynamically.

Code Reference

Source Location

Signature

class ODAC(base.Clusterer):
    def __init__(self, confidence_level: float = 0.9, n_min: int = 100, tau: float = 0.1):
        ...

Import

from river import cluster
model = cluster.ODAC()

I/O Contract

Input
Parameter Type Description
x dict Dictionary of time series observations with feature names as keys
Output
Method Return Type Description
learn_one(x) None Updates the hierarchical cluster structure
render_ascii(n_decimal_places) str Returns ASCII representation of tree structure
draw(max_depth, show_clusters_info, n_decimal_places) graphviz.Digraph Returns Graphviz visualization
Parameters
Name Type Default Description
confidence_level float 0.9 Confidence level for Hoeffding bound (between 0 and 1)
n_min int 100 Minimum observations before checking for splits/merges
tau float 0.1 Threshold to force splits and break ties (must be > 0)
Properties
Property Type Description
structure_changed bool True when structure changed via split or merge
n_clusters int Total number of clusters in the hierarchy
n_active_clusters int Number of active (leaf) clusters
height int Height of the hierarchical tree
summary dict Dictionary with n_clusters, n_active_clusters, and height

Usage Examples

from river import cluster
from river.datasets import synth

model = cluster.ODAC(confidence_level=0.9, n_min=100, tau=0.1)

dataset = synth.FriedmanDrift(drift_type='gra', position=(150, 200), seed=42)

for i, (x, _) in enumerate(dataset.take(500)):
    model.learn_one(x)
    if model.structure_changed:
        print(f"Structure changed at observation {i + 1}")

# Display the hierarchical structure
print(model.render_ascii())

# Access properties
print(f"Number of clusters: {model.n_clusters}")
print(f"Number of active clusters: {model.n_active_clusters}")
print(f"Tree height: {model.height}")
print(model.summary)

# Visualize with Graphviz (if installed)
# graph = model.draw(max_depth=3, show_clusters_info=['timeseries_names', 'd1', 'd2'])
# graph.render('odac_tree', format='png')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment