Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:LMCache LMCache Disagg Proxy Server

From Leeroopedia


Knowledge Sources
Domains Distributed_Systems, Serving
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for routing requests through the disaggregated prefill-decode pipeline provided as a FastAPI application.

Description

The disagg_proxy_server.py implements a FastAPI application with /v1/chat/completions and /v1/completions endpoints. It manages httpx async clients for prefiller and decoder instances, a ZMQ PULL server for ProxyNotif messages (sent by prefillers when NIXL KV transfer completes), and round-robin load balancing across multiple prefiller/decoder instances.

Usage

Launch the proxy server before starting vLLM instances. Configure via CLI arguments specifying prefiller and decoder endpoints.

Code Reference

Source Location

  • Repository: LMCache
  • File: examples/disagg_prefill/disagg_proxy_server.py
  • Lines: L33-L629

Signature

# FastAPI application with endpoints:
app = FastAPI(lifespan=lifespan)

@app.post("/v1/chat/completions")
async def handle_chat_completions(request: Request):
    """Route chat completion through prefill-decode pipeline."""

@app.post("/v1/completions")
async def handle_completions(request: Request):
    """Route text completion through prefill-decode pipeline."""

Import

python examples/disagg_prefill/disagg_proxy_server.py \
    --port 8000 \
    --prefiller-host localhost --prefiller-port 8100 \
    --decoder-host localhost --decoder-port 8200 \
    --proxy-port 7500

I/O Contract

Inputs

Name Type Required Description
--port int Yes Proxy listen port
--prefiller-host/port str/int Yes Prefiller vLLM endpoint
--decoder-host/port str/int Yes Decoder vLLM endpoint
--proxy-port int Yes ZMQ PULL port for ProxyNotif messages

Outputs

Name Type Description
HTTP response StreamingResponse Streamed chat/completion response from decoder

Usage Examples

Launch Proxy

python examples/disagg_prefill/disagg_proxy_server.py \
    --port 8000 \
    --prefiller-host localhost --prefiller-port 8100 \
    --decoder-host localhost --decoder-port 8200 \
    --proxy-host localhost --proxy-port 7500 \
    --num-prefillers 1 --num-decoders 1

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment