Implementation:LMCache LMCache Disagg Proxy Server
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Systems, Serving |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for routing requests through the disaggregated prefill-decode pipeline provided as a FastAPI application.
Description
The disagg_proxy_server.py implements a FastAPI application with /v1/chat/completions and /v1/completions endpoints. It manages httpx async clients for prefiller and decoder instances, a ZMQ PULL server for ProxyNotif messages (sent by prefillers when NIXL KV transfer completes), and round-robin load balancing across multiple prefiller/decoder instances.
Usage
Launch the proxy server before starting vLLM instances. Configure via CLI arguments specifying prefiller and decoder endpoints.
Code Reference
Source Location
- Repository: LMCache
- File: examples/disagg_prefill/disagg_proxy_server.py
- Lines: L33-L629
Signature
# FastAPI application with endpoints:
app = FastAPI(lifespan=lifespan)
@app.post("/v1/chat/completions")
async def handle_chat_completions(request: Request):
"""Route chat completion through prefill-decode pipeline."""
@app.post("/v1/completions")
async def handle_completions(request: Request):
"""Route text completion through prefill-decode pipeline."""
Import
python examples/disagg_prefill/disagg_proxy_server.py \
--port 8000 \
--prefiller-host localhost --prefiller-port 8100 \
--decoder-host localhost --decoder-port 8200 \
--proxy-port 7500
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --port | int | Yes | Proxy listen port |
| --prefiller-host/port | str/int | Yes | Prefiller vLLM endpoint |
| --decoder-host/port | str/int | Yes | Decoder vLLM endpoint |
| --proxy-port | int | Yes | ZMQ PULL port for ProxyNotif messages |
Outputs
| Name | Type | Description |
|---|---|---|
| HTTP response | StreamingResponse | Streamed chat/completion response from decoder |
Usage Examples
Launch Proxy
python examples/disagg_prefill/disagg_proxy_server.py \
--port 8000 \
--prefiller-host localhost --prefiller-port 8100 \
--decoder-host localhost --decoder-port 8200 \
--proxy-host localhost --proxy-port 7500 \
--num-prefillers 1 --num-decoders 1