Implementation:InternLM Lmdeploy Serve Proxy
| Knowledge Sources | |
|---|---|
| Domains | LLM_Serving, Infrastructure |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Concrete tool for distributing requests across multiple LMDeploy API server instances provided by the LMDeploy library.
Description
The proxy() function and its CLI wrapper lmdeploy serve proxy launch a load-balancing proxy server that routes client requests across multiple api_server nodes. API server instances register themselves with the proxy and the proxy distributes incoming requests using configurable routing strategies.
Usage
Use this when scaling LLM serving across multiple GPU nodes. Start multiple api_server instances first, then launch the proxy to provide a unified endpoint.
Code Reference
Source Location
- Repository: lmdeploy
- File: lmdeploy/serve/proxy/proxy.py
- Lines: L830-841 (proxy function)
- CLI: lmdeploy/cli/serve.py L160-198
Signature
def proxy(server_name: str = '0.0.0.0',
server_port: int = 8000,
serving_strategy: Literal['Hybrid', 'DistServe'] = 'Hybrid',
routing_strategy: Literal['random', 'min_expected_latency',
'min_observed_latency'] = 'random',
api_keys: Optional[Union[List[str], str]] = None,
ssl: bool = False) -> None:
Import
from lmdeploy.serve.proxy.proxy import proxy
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| server_name | str | No | Host binding (default: '0.0.0.0') |
| server_port | int | No | Port (default: 8000) |
| serving_strategy | str | No | 'Hybrid' or 'DistServe' (prefill-decode disaggregation) |
| routing_strategy | str | No | 'random', 'min_expected_latency', or 'min_observed_latency' |
Outputs
| Name | Type | Description |
|---|---|---|
| HTTP Proxy | Running Process | Load-balancing proxy on host:port routing to api_server nodes |
Usage Examples
CLI Launch
# Start proxy with default settings
lmdeploy serve proxy --server-port 8000
# With latency-based routing
lmdeploy serve proxy \
--server-port 8000 \
--routing-strategy min_expected_latency