Implementation:Online ml River Cluster ODAC
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Clustering, Hierarchical_Clustering, Time_Series |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Online Divisive-Agglomerative Clustering (ODAC) continuously maintains a hierarchical cluster structure from evolving time series data streams.
Description
ODAC is a hierarchical clustering algorithm designed for streaming time series data. It uses a distance metric based on Pearson correlation: rnomc(a, b) = sqrt((1 - corr(a, b)) / 2). The algorithm continuously monitors the evolution of cluster diameters and dynamically splits or merges clusters based on statistical tests using the Hoeffding bound.
The split operator triggers when the difference between the largest distance (diameter) and the second largest distance exceeds a confidence threshold. The merge operator checks if a child cluster's diameter is larger than its parent's diameter, again using the Hoeffding bound to ensure statistical significance.
ODAC only monitors leaf clusters for splitting and merging operations, making it efficient for real-time processing. When the structure changes through split or merge operations, the structure_changed flag is set to true, allowing users to track structural evolution.
Usage
Use ODAC when you need to discover and maintain hierarchical cluster structures in streaming time series data, especially when the number of clusters is unknown and may change over time due to concept drift. It's particularly useful for monitoring systems where the relationships between time series evolve dynamically.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/cluster/odac.py
Signature
class ODAC(base.Clusterer):
def __init__(self, confidence_level: float = 0.9, n_min: int = 100, tau: float = 0.1):
...
Import
from river import cluster
model = cluster.ODAC()
I/O Contract
| Parameter | Type | Description |
|---|---|---|
| x | dict | Dictionary of time series observations with feature names as keys |
| Method | Return Type | Description |
|---|---|---|
| learn_one(x) | None | Updates the hierarchical cluster structure |
| render_ascii(n_decimal_places) | str | Returns ASCII representation of tree structure |
| draw(max_depth, show_clusters_info, n_decimal_places) | graphviz.Digraph | Returns Graphviz visualization |
| Name | Type | Default | Description |
|---|---|---|---|
| confidence_level | float | 0.9 | Confidence level for Hoeffding bound (between 0 and 1) |
| n_min | int | 100 | Minimum observations before checking for splits/merges |
| tau | float | 0.1 | Threshold to force splits and break ties (must be > 0) |
| Property | Type | Description |
|---|---|---|
| structure_changed | bool | True when structure changed via split or merge |
| n_clusters | int | Total number of clusters in the hierarchy |
| n_active_clusters | int | Number of active (leaf) clusters |
| height | int | Height of the hierarchical tree |
| summary | dict | Dictionary with n_clusters, n_active_clusters, and height |
Usage Examples
from river import cluster
from river.datasets import synth
model = cluster.ODAC(confidence_level=0.9, n_min=100, tau=0.1)
dataset = synth.FriedmanDrift(drift_type='gra', position=(150, 200), seed=42)
for i, (x, _) in enumerate(dataset.take(500)):
model.learn_one(x)
if model.structure_changed:
print(f"Structure changed at observation {i + 1}")
# Display the hierarchical structure
print(model.render_ascii())
# Access properties
print(f"Number of clusters: {model.n_clusters}")
print(f"Number of active clusters: {model.n_active_clusters}")
print(f"Tree height: {model.height}")
print(model.summary)
# Visualize with Graphviz (if installed)
# graph = model.draw(max_depth=3, show_clusters_info=['timeseries_names', 'd1', 'd2'])
# graph.render('odac_tree', format='png')