Implementation:Online ml River Sketch Counter
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Streaming_Algorithms, Approximate_Counting |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Count-Min Sketch algorithm for approximate frequency counting with fixed memory footprint.
Description
Implements the Count-Min Sketch (CMS) algorithm for counting hashable elements using a fixed-size matrix of counters. Provides approximate counts with probabilistic guarantees controlled by epsilon (error rate) and delta (confidence). Supports dot product operations for similarity computations and handles both positive and negative updates (turnstile model).
Usage
Use for frequency estimation in streams when exact counting requires too much memory, such as tracking IP addresses, URLs, or words in text streams. Ideal for heavy hitters detection, frequency-based filtering, or approximate join size estimation.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/sketch/counter.py
Signature
class Counter(base.Base):
def __init__(self, epsilon: float = 0.1, delta: float = 0.05, seed: int | None = None):
...
def update(self, x: typing.Hashable, w: int = 1):
...
def __getitem__(self, x) -> int:
...
def __matmul__(self, other: Counter) -> int:
...
def total(self) -> int:
...
Import
from river import sketch
I/O Contract
| Parameter | Type | Description |
|---|---|---|
| epsilon | float | Approximation error parameter (default: 0.1) |
| delta | float | Confidence parameter (default: 0.05) |
| seed | int or None | Random seed for hash functions |
Usage Examples
import collections
import random
from river import sketch
cms = sketch.Counter(epsilon=0.005, seed=0)
rng = random.Random(7)
# Compare with exact counter
counter = collections.Counter()
for _ in range(10000):
v = rng.randint(-1000, 1000)
cms.update(v)
counter[v] += 1
print(f"CMS dimensions: {cms.n_tables} tables x {cms.n_slots} slots")
print(f"Exact count of 7: {counter[7]}")
print(f"CMS estimate of 7: {cms[7]}")
print(f"Total count: {cms.total()}")
# Cosine similarity between two CMS
cms_a = sketch.Counter(epsilon=0.001, delta=0.01, seed=0)
cms_b = sketch.Counter(epsilon=0.001, delta=0.01, seed=7)
for _ in range(5000):
cms_a.update(rng.randint(-1000, 1000))
for _ in range(5000):
cms_b.update(rng.randint(-1000, 1000))
# Dot product
dot = cms_a @ cms_b
print(f"Approximate dot product: {dot}")