Implementation:LMCache LMCache Blend Example Script

Knowledge Sources	LMCache
Domains	Benchmarking, Inference_Optimization
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for validating CacheBlend performance by running sequential inference requests with reordered segments, provided as an example script.

Description

The blend.py example script demonstrates the full CacheBlend workflow: setting up environment variables, creating a vLLM LLM instance with the LMCache connector, building prompts with separator-delimited segments, running multiple requests (warmup, first/store, second/blend, third/validate), and measuring generation times.

This is a Pattern Doc documenting the expected usage pattern for CacheBlend.

Usage

Run the script directly to validate CacheBlend operation.

Code Reference

Source Location

Repository: LMCache
File: examples/blend_kv_v1/blend.py
Lines: L1-L217

Signature

def setup_environment_variables(
    use_disk: bool = False,
    blend_special_str: str = " # # ",
    enable_sparse: bool = False,
) -> None:
    """Set LMCACHE_* environment variables for CacheBlend."""

@contextlib.contextmanager
def build_llm_with_lmcache(lmcache_connector: str, model: str):
    """Create vLLM LLM with LMCache connector, cleanup on exit."""

def main():
    """Run CacheBlend validation: warmup, store, blend, validate."""

Import

python examples/blend_kv_v1/blend.py --model mistralai/Mistral-7B-Instruct-v0.2

I/O Contract

Inputs

Name	Type	Required	Description
--model	str	No	Model name (default: mistralai/Mistral-7B-Instruct-v0.2)
--use-disk	flag	No	Use disk backend instead of CPU
--blend-special-str	str	No	Separator string (default: "# #")
--enable-sparse	flag	No	Enable sparse attention with FlashInfer

Outputs

Name	Type	Description
console output	text	Generated text and timing for each request (warmup, first, second, third)

Usage Examples

Basic CacheBlend Validation

# Run with default settings
python examples/blend_kv_v1/blend.py

# Run with custom model and disk backend
python examples/blend_kv_v1/blend.py \
    --model meta-llama/Llama-3.1-8B-Instruct \
    --use-disk

Expected Output Pattern

# warmup request: ~10s (cold start)
# first request: ~5s (computes and stores segments)
# second request: ~2s (reuses segments with reordered chunks)
# third request: ~2s (reuses segments again)

Related Pages

Implements Principle

Principle:LMCache_LMCache_CacheBlend_Performance_Validation

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment