Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Sketch Set

From Leeroopedia
Revision as of 16:10, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Online_ml_River_Sketch_Set.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Online_Learning, Streaming_Algorithms, Bloom_Filters
Last Updated 2026-02-08 16:00 GMT

Overview

Bloom filter for approximate set membership testing with fixed memory and no false negatives.

Description

Implements a Bloom filter using a bit array and multiple hash functions for probabilistic set membership testing. Items are stored as binary codes by setting multiple bit positions. Membership queries never produce false negatives but may produce false positives (claiming an item exists when it doesn't). Supports set union and intersection operations between compatible filters.

Usage

Use for detecting duplicate items in streams, cache filtering, or database query optimization where false positives are acceptable but false negatives are not. Memory-efficient alternative to storing all seen items explicitly.

Code Reference

Source Location

Signature

class Set(base.Base):
    def __init__(self, capacity: int = 2048, fp_rate: float = 0.01, seed: int | None = None):
        ...

    def add(self, x: typing.Hashable):
        ...

    def update(self, values: typing.Iterable):
        ...

    def __contains__(self, x: typing.Hashable):
        ...

    def union(self, other: Set):
        ...

    def intersection(self, other: Set):
        ...

Import

from river import sketch

I/O Contract

Parameter Type Description
capacity int Maximum distinct items to store (default: 2048)
fp_rate float False positive rate (default: 0.01)
seed int or None Random seed for hash functions

Usage Examples

import random
from river import sketch

rng = random.Random(42)
s_set = sketch.Set(capacity=100, seed=0)

print(f"Hash functions: {s_set.n_hash}")
print(f"Bit array size: {s_set.n_bits}")

# Add items
for _ in range(1000):
    s_set.add(rng.randint(0, 200))

# Check membership
print(f"\n1 in set: {1 in s_set}")
print(f"-10 in set (false positive): {-10 in s_set}")

# Batch update
s_set.update([1, 2, 3, 4, 5, 6, 7])

# Set operations
s1 = sketch.Set(seed=8)
s2 = sketch.Set(seed=8)

for _ in range(1000):
    s1.add(rng.randint(0, 5000))
    s2.add(rng.randint(0, 5000))

# Intersection
s_int = s1 & s2
print(f"\n43 in s1: {43 in s1}")
print(f"43 in s2: {43 in s2}")
print(f"43 in intersection: {43 in s_int}")

# Union
s_union = s1 | s2
print(f"43 in union: {43 in s_union}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment