Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Sketch HeavyHitters

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Streaming_Algorithms, Frequent_Items
Last Updated 2026-02-08 16:00 GMT

Overview

Lossy Count with Forgetting algorithm for tracking frequent items in data streams with recency bias.

Description

Identifies heavy hitters (frequent items) in streams using the Lossy Count algorithm with a forgetting factor. Automatically discards infrequent items to maintain bounded memory while tracking items exceeding a support threshold. The fading factor gives more weight to recent observations, making it adaptive to changing patterns.

Usage

Use for identifying trending topics, frequent patterns, or anomalous items in evolving streams. Ideal for network monitoring (frequent IPs), market basket analysis (frequent itemsets), or social media trend detection where recency matters.

Code Reference

Source Location

Signature

class HeavyHitters(base.Base):
    def __init__(
        self,
        support: float = 0.001,
        epsilon: float = 0.005,
        fading_factor: float = 0.999
    ):
        ...

    def update(self, x: typing.Hashable):
        ...

    def __getitem__(self, index) -> float:
        ...

    def most_common(self, n: int | None = None) -> list[tuple[typing.Hashable, float]]:
        ...

Import

from river import sketch

I/O Contract

Parameter Type Description
support float Minimum frequency threshold (0-1)
epsilon float Error tolerance (should be < support)
fading_factor float Forgetting factor for recency (0-1)

Usage Examples

import random
import string
from river import sketch

rng = random.Random(42)
hh = sketch.HeavyHitters()

# Feed random ASCII characters
for _ in range(10000):
    hh.update(rng.choice(string.printable))

# Get top frequent items
print("Top 5 heavy hitters:")
for item, freq in hh.most_common(5):
    print(f"  '{item}': {freq:.2f}")

# Check specific item frequency
print(f"\nFrequency of 'A': {hh['A']:.2f}")
print(f"Frequency of unseen item: {hh[(1, 2, 3)]:.2f}")

# Adapt to changing distribution
print("\nSimulating distribution shift...")
hh2 = sketch.HeavyHitters(support=0.01, epsilon=0.005, fading_factor=0.99)

# First period: mostly 'a'
for _ in range(5000):
    hh2.update('a')

# Second period: mostly 'b'
for _ in range(5000):
    hh2.update('b')

print("Most common after shift:")
for item, freq in hh2.most_common(3):
    print(f"  '{item}': {freq:.2f}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment