Implementation:LMCache LMCache Bloom Filter
| Knowledge Sources | |
|---|---|
| Domains | Data Structures, Probabilistic Algorithms |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
BloomFilter provides a memory-efficient probabilistic set membership data structure for LMCache.
Description
The BloomFilter class implements a standard Bloom filter using a compact bit array backed by Python's array.array with 32-bit unsigned integers. It computes optimal bit array size and hash count from the expected number of elements and desired false positive rate. Hashing uses SHA-256 with salt-based multiple hash generation, supporting both string and integer items. The class provides batch insertion with deduplication counting via add_batch_with_hashes_and_check, which returns the number of genuinely new items added. Statistics methods report memory usage, fill rate, bit count, and other diagnostic information.
Usage
Use the BloomFilter for fast, space-efficient approximate membership testing of cache keys. It is suitable for scenarios where the system needs to quickly determine whether a key likely exists before performing an expensive remote lookup, trading a small false positive rate for significant memory savings compared to exact set storage.
Code Reference
Source Location
- Repository: LMCache
- File: lmcache/v1/utils/bloom_filter.py
- Lines: 1-109
Signature
class BloomFilter:
def __init__(self, expected_elements: int = 1000000, false_positive_rate: float = 0.01) -> None: ...
def add(self, item: Union[str, int]) -> None: ...
def add_batch_with_hashes_and_check(self, positions_list: list[list[int]]) -> int: ...
def contains(self, item: Union[str, int]) -> bool: ...
def clear(self) -> None: ...
def get_memory_usage_bytes(self) -> int: ...
def get_bit_set(self) -> int: ...
def get_fill_rate(self) -> float: ...
def get_statistics(self) -> dict: ...
Import
from lmcache.v1.utils.bloom_filter import BloomFilter
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| expected_elements | int | No (default 1000000) | Expected number of elements to be inserted |
| false_positive_rate | float | No (default 0.01) | Desired false positive probability |
| item | Union[str, int] | Yes (for add/contains) | The item to add or check membership for |
| positions_list | list[list[int]] | Yes (for add_batch_with_hashes_and_check) | Pre-computed hash positions for batch insertion |
Outputs
| Name | Type | Description |
|---|---|---|
| contains return | bool | True if the item is possibly in the set (may be false positive), False if definitely not in the set |
| add_batch_with_hashes_and_check return | int | Number of genuinely new items added in the batch |
| get_statistics return | dict | Dictionary with size_mb, hash_count, item_count, bits_set, fill_rate, expected_elements, and false_positive_rate |
Usage Examples
from lmcache.v1.utils.bloom_filter import BloomFilter
# Create a Bloom filter for 1 million elements with 1% false positive rate
bf = BloomFilter(expected_elements=1_000_000, false_positive_rate=0.01)
# Add items
bf.add("cache_key_1")
bf.add(42)
# Check membership
if bf.contains("cache_key_1"):
print("Key likely exists")
if not bf.contains("nonexistent_key"):
print("Key definitely does not exist")
# Get diagnostics
stats = bf.get_statistics()
print(f"Memory: {stats['size_mb']:.2f} MB, Fill rate: {stats['fill_rate']:.2%}")
# Clear and reuse
bf.clear()