Implementation:LMCache LMCache Blend Example Script
| Knowledge Sources | |
|---|---|
| Domains | Benchmarking, Inference_Optimization |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for validating CacheBlend performance by running sequential inference requests with reordered segments, provided as an example script.
Description
The blend.py example script demonstrates the full CacheBlend workflow: setting up environment variables, creating a vLLM LLM instance with the LMCache connector, building prompts with separator-delimited segments, running multiple requests (warmup, first/store, second/blend, third/validate), and measuring generation times.
This is a Pattern Doc documenting the expected usage pattern for CacheBlend.
Usage
Run the script directly to validate CacheBlend operation.
Code Reference
Source Location
- Repository: LMCache
- File: examples/blend_kv_v1/blend.py
- Lines: L1-L217
Signature
def setup_environment_variables(
use_disk: bool = False,
blend_special_str: str = " # # ",
enable_sparse: bool = False,
) -> None:
"""Set LMCACHE_* environment variables for CacheBlend."""
@contextlib.contextmanager
def build_llm_with_lmcache(lmcache_connector: str, model: str):
"""Create vLLM LLM with LMCache connector, cleanup on exit."""
def main():
"""Run CacheBlend validation: warmup, store, blend, validate."""
Import
python examples/blend_kv_v1/blend.py --model mistralai/Mistral-7B-Instruct-v0.2
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --model | str | No | Model name (default: mistralai/Mistral-7B-Instruct-v0.2) |
| --use-disk | flag | No | Use disk backend instead of CPU |
| --blend-special-str | str | No | Separator string (default: "# #") |
| --enable-sparse | flag | No | Enable sparse attention with FlashInfer |
Outputs
| Name | Type | Description |
|---|---|---|
| console output | text | Generated text and timing for each request (warmup, first, second, third) |
Usage Examples
Basic CacheBlend Validation
# Run with default settings
python examples/blend_kv_v1/blend.py
# Run with custom model and disk backend
python examples/blend_kv_v1/blend.py \
--model meta-llama/Llama-3.1-8B-Instruct \
--use-disk
Expected Output Pattern
# warmup request: ~10s (cold start)
# first request: ~5s (computes and stores segments)
# second request: ~2s (reuses segments with reordered chunks)
# third request: ~2s (reuses segments again)