Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:LMCache LMCache Bloom Filter

From Leeroopedia


Knowledge Sources
Domains Data Structures, Probabilistic Algorithms
Last Updated 2026-02-09 00:00 GMT

Overview

BloomFilter provides a memory-efficient probabilistic set membership data structure for LMCache.

Description

The BloomFilter class implements a standard Bloom filter using a compact bit array backed by Python's array.array with 32-bit unsigned integers. It computes optimal bit array size and hash count from the expected number of elements and desired false positive rate. Hashing uses SHA-256 with salt-based multiple hash generation, supporting both string and integer items. The class provides batch insertion with deduplication counting via add_batch_with_hashes_and_check, which returns the number of genuinely new items added. Statistics methods report memory usage, fill rate, bit count, and other diagnostic information.

Usage

Use the BloomFilter for fast, space-efficient approximate membership testing of cache keys. It is suitable for scenarios where the system needs to quickly determine whether a key likely exists before performing an expensive remote lookup, trading a small false positive rate for significant memory savings compared to exact set storage.

Code Reference

Source Location

Signature

class BloomFilter:
    def __init__(self, expected_elements: int = 1000000, false_positive_rate: float = 0.01) -> None: ...
    def add(self, item: Union[str, int]) -> None: ...
    def add_batch_with_hashes_and_check(self, positions_list: list[list[int]]) -> int: ...
    def contains(self, item: Union[str, int]) -> bool: ...
    def clear(self) -> None: ...
    def get_memory_usage_bytes(self) -> int: ...
    def get_bit_set(self) -> int: ...
    def get_fill_rate(self) -> float: ...
    def get_statistics(self) -> dict: ...

Import

from lmcache.v1.utils.bloom_filter import BloomFilter

I/O Contract

Inputs

Name Type Required Description
expected_elements int No (default 1000000) Expected number of elements to be inserted
false_positive_rate float No (default 0.01) Desired false positive probability
item Union[str, int] Yes (for add/contains) The item to add or check membership for
positions_list list[list[int]] Yes (for add_batch_with_hashes_and_check) Pre-computed hash positions for batch insertion

Outputs

Name Type Description
contains return bool True if the item is possibly in the set (may be false positive), False if definitely not in the set
add_batch_with_hashes_and_check return int Number of genuinely new items added in the batch
get_statistics return dict Dictionary with size_mb, hash_count, item_count, bits_set, fill_rate, expected_elements, and false_positive_rate

Usage Examples

from lmcache.v1.utils.bloom_filter import BloomFilter

# Create a Bloom filter for 1 million elements with 1% false positive rate
bf = BloomFilter(expected_elements=1_000_000, false_positive_rate=0.01)

# Add items
bf.add("cache_key_1")
bf.add(42)

# Check membership
if bf.contains("cache_key_1"):
    print("Key likely exists")

if not bf.contains("nonexistent_key"):
    print("Key definitely does not exist")

# Get diagnostics
stats = bf.get_statistics()
print(f"Memory: {stats['size_mb']:.2f} MB, Fill rate: {stats['fill_rate']:.2%}")

# Clear and reuse
bf.clear()

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment