Implementation:InternLM Lmdeploy Serve Proxy

Knowledge Sources	LMDeploy Proxy Server
Domains	LLM_Serving, Infrastructure
Last Updated	2026-02-07 15:00 GMT

Overview

Concrete tool for distributing requests across multiple LMDeploy API server instances provided by the LMDeploy library.

Description

The proxy() function and its CLI wrapper lmdeploy serve proxy launch a load-balancing proxy server that routes client requests across multiple api_server nodes. API server instances register themselves with the proxy and the proxy distributes incoming requests using configurable routing strategies.

Usage

Use this when scaling LLM serving across multiple GPU nodes. Start multiple api_server instances first, then launch the proxy to provide a unified endpoint.

Code Reference

Source Location

Repository: lmdeploy
File: lmdeploy/serve/proxy/proxy.py
Lines: L830-841 (proxy function)
CLI: lmdeploy/cli/serve.py L160-198

Signature

def proxy(server_name: str = '0.0.0.0',
          server_port: int = 8000,
          serving_strategy: Literal['Hybrid', 'DistServe'] = 'Hybrid',
          routing_strategy: Literal['random', 'min_expected_latency',
                                     'min_observed_latency'] = 'random',
          api_keys: Optional[Union[List[str], str]] = None,
          ssl: bool = False) -> None:

Import

from lmdeploy.serve.proxy.proxy import proxy

I/O Contract

Inputs

Name	Type	Required	Description
server_name	str	No	Host binding (default: '0.0.0.0')
server_port	int	No	Port (default: 8000)
serving_strategy	str	No	'Hybrid' or 'DistServe' (prefill-decode disaggregation)
routing_strategy	str	No	'random', 'min_expected_latency', or 'min_observed_latency'

Outputs

Name	Type	Description
HTTP Proxy	Running Process	Load-balancing proxy on host:port routing to api_server nodes

Usage Examples

CLI Launch

# Start proxy with default settings
lmdeploy serve proxy --server-port 8000

# With latency-based routing
lmdeploy serve proxy \
    --server-port 8000 \
    --routing-strategy min_expected_latency

Related Pages

Implements Principle

Principle:InternLM_Lmdeploy_Load_Balancing_Proxy

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment