Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Sketch Counter

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Streaming_Algorithms, Approximate_Counting
Last Updated 2026-02-08 16:00 GMT

Overview

Count-Min Sketch algorithm for approximate frequency counting with fixed memory footprint.

Description

Implements the Count-Min Sketch (CMS) algorithm for counting hashable elements using a fixed-size matrix of counters. Provides approximate counts with probabilistic guarantees controlled by epsilon (error rate) and delta (confidence). Supports dot product operations for similarity computations and handles both positive and negative updates (turnstile model).

Usage

Use for frequency estimation in streams when exact counting requires too much memory, such as tracking IP addresses, URLs, or words in text streams. Ideal for heavy hitters detection, frequency-based filtering, or approximate join size estimation.

Code Reference

Source Location

Signature

class Counter(base.Base):
    def __init__(self, epsilon: float = 0.1, delta: float = 0.05, seed: int | None = None):
        ...

    def update(self, x: typing.Hashable, w: int = 1):
        ...

    def __getitem__(self, x) -> int:
        ...

    def __matmul__(self, other: Counter) -> int:
        ...

    def total(self) -> int:
        ...

Import

from river import sketch

I/O Contract

Parameter Type Description
epsilon float Approximation error parameter (default: 0.1)
delta float Confidence parameter (default: 0.05)
seed int or None Random seed for hash functions

Usage Examples

import collections
import random
from river import sketch

cms = sketch.Counter(epsilon=0.005, seed=0)
rng = random.Random(7)

# Compare with exact counter
counter = collections.Counter()

for _ in range(10000):
    v = rng.randint(-1000, 1000)
    cms.update(v)
    counter[v] += 1

print(f"CMS dimensions: {cms.n_tables} tables x {cms.n_slots} slots")
print(f"Exact count of 7: {counter[7]}")
print(f"CMS estimate of 7: {cms[7]}")
print(f"Total count: {cms.total()}")

# Cosine similarity between two CMS
cms_a = sketch.Counter(epsilon=0.001, delta=0.01, seed=0)
cms_b = sketch.Counter(epsilon=0.001, delta=0.01, seed=7)

for _ in range(5000):
    cms_a.update(rng.randint(-1000, 1000))

for _ in range(5000):
    cms_b.update(rng.randint(-1000, 1000))

# Dot product
dot = cms_a @ cms_b
print(f"Approximate dot product: {dot}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment