Implementation:LMCache LMCache Bloom Filter

Knowledge Sources	LMCache
Domains	Data Structures, Probabilistic Algorithms
Last Updated	2026-02-09 00:00 GMT

Overview

BloomFilter provides a memory-efficient probabilistic set membership data structure for LMCache.

Description

The BloomFilter class implements a standard Bloom filter using a compact bit array backed by Python's array.array with 32-bit unsigned integers. It computes optimal bit array size and hash count from the expected number of elements and desired false positive rate. Hashing uses SHA-256 with salt-based multiple hash generation, supporting both string and integer items. The class provides batch insertion with deduplication counting via add_batch_with_hashes_and_check, which returns the number of genuinely new items added. Statistics methods report memory usage, fill rate, bit count, and other diagnostic information.

Usage

Use the BloomFilter for fast, space-efficient approximate membership testing of cache keys. It is suitable for scenarios where the system needs to quickly determine whether a key likely exists before performing an expensive remote lookup, trading a small false positive rate for significant memory savings compared to exact set storage.

Code Reference

Source Location

Repository: LMCache
File: lmcache/v1/utils/bloom_filter.py
Lines: 1-109

Signature

class BloomFilter:
    def __init__(self, expected_elements: int = 1000000, false_positive_rate: float = 0.01) -> None: ...
    def add(self, item: Union[str, int]) -> None: ...
    def add_batch_with_hashes_and_check(self, positions_list: list[list[int]]) -> int: ...
    def contains(self, item: Union[str, int]) -> bool: ...
    def clear(self) -> None: ...
    def get_memory_usage_bytes(self) -> int: ...
    def get_bit_set(self) -> int: ...
    def get_fill_rate(self) -> float: ...
    def get_statistics(self) -> dict: ...

Import

from lmcache.v1.utils.bloom_filter import BloomFilter

I/O Contract

Inputs

Name	Type	Required	Description
expected_elements	int	No (default 1000000)	Expected number of elements to be inserted
false_positive_rate	float	No (default 0.01)	Desired false positive probability
item	Union[str, int]	Yes (for add/contains)	The item to add or check membership for
positions_list	list[list[int]]	Yes (for add_batch_with_hashes_and_check)	Pre-computed hash positions for batch insertion

Outputs

Name	Type	Description
contains return	bool	True if the item is possibly in the set (may be false positive), False if definitely not in the set
add_batch_with_hashes_and_check return	int	Number of genuinely new items added in the batch
get_statistics return	dict	Dictionary with size_mb, hash_count, item_count, bits_set, fill_rate, expected_elements, and false_positive_rate

Usage Examples

from lmcache.v1.utils.bloom_filter import BloomFilter

# Create a Bloom filter for 1 million elements with 1% false positive rate
bf = BloomFilter(expected_elements=1_000_000, false_positive_rate=0.01)

# Add items
bf.add("cache_key_1")
bf.add(42)

# Check membership
if bf.contains("cache_key_1"):
    print("Key likely exists")

if not bf.contains("nonexistent_key"):
    print("Key definitely does not exist")

# Get diagnostics
stats = bf.get_statistics()
print(f"Memory: {stats['size_mb']:.2f} MB, Fill rate: {stats['fill_rate']:.2%}")

# Clear and reuse
bf.clear()

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment