Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:LMCache LMCache Blend Example Script

From Leeroopedia


Knowledge Sources
Domains Benchmarking, Inference_Optimization
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for validating CacheBlend performance by running sequential inference requests with reordered segments, provided as an example script.

Description

The blend.py example script demonstrates the full CacheBlend workflow: setting up environment variables, creating a vLLM LLM instance with the LMCache connector, building prompts with separator-delimited segments, running multiple requests (warmup, first/store, second/blend, third/validate), and measuring generation times.

This is a Pattern Doc documenting the expected usage pattern for CacheBlend.

Usage

Run the script directly to validate CacheBlend operation.

Code Reference

Source Location

  • Repository: LMCache
  • File: examples/blend_kv_v1/blend.py
  • Lines: L1-L217

Signature

def setup_environment_variables(
    use_disk: bool = False,
    blend_special_str: str = " # # ",
    enable_sparse: bool = False,
) -> None:
    """Set LMCACHE_* environment variables for CacheBlend."""

@contextlib.contextmanager
def build_llm_with_lmcache(lmcache_connector: str, model: str):
    """Create vLLM LLM with LMCache connector, cleanup on exit."""

def main():
    """Run CacheBlend validation: warmup, store, blend, validate."""

Import

python examples/blend_kv_v1/blend.py --model mistralai/Mistral-7B-Instruct-v0.2

I/O Contract

Inputs

Name Type Required Description
--model str No Model name (default: mistralai/Mistral-7B-Instruct-v0.2)
--use-disk flag No Use disk backend instead of CPU
--blend-special-str str No Separator string (default: "# #")
--enable-sparse flag No Enable sparse attention with FlashInfer

Outputs

Name Type Description
console output text Generated text and timing for each request (warmup, first, second, third)

Usage Examples

Basic CacheBlend Validation

# Run with default settings
python examples/blend_kv_v1/blend.py

# Run with custom model and disk backend
python examples/blend_kv_v1/blend.py \
    --model meta-llama/Llama-3.1-8B-Instruct \
    --use-disk

Expected Output Pattern

# warmup request: ~10s (cold start)
# first request: ~5s (computes and stores segments)
# second request: ~2s (reuses segments with reordered chunks)
# third request: ~2s (reuses segments again)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment