Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Serve Proxy

From Leeroopedia


Knowledge Sources
Domains LLM_Serving, Infrastructure
Last Updated 2026-02-07 15:00 GMT

Overview

Concrete tool for distributing requests across multiple LMDeploy API server instances provided by the LMDeploy library.

Description

The proxy() function and its CLI wrapper lmdeploy serve proxy launch a load-balancing proxy server that routes client requests across multiple api_server nodes. API server instances register themselves with the proxy and the proxy distributes incoming requests using configurable routing strategies.

Usage

Use this when scaling LLM serving across multiple GPU nodes. Start multiple api_server instances first, then launch the proxy to provide a unified endpoint.

Code Reference

Source Location

  • Repository: lmdeploy
  • File: lmdeploy/serve/proxy/proxy.py
  • Lines: L830-841 (proxy function)
  • CLI: lmdeploy/cli/serve.py L160-198

Signature

def proxy(server_name: str = '0.0.0.0',
          server_port: int = 8000,
          serving_strategy: Literal['Hybrid', 'DistServe'] = 'Hybrid',
          routing_strategy: Literal['random', 'min_expected_latency',
                                     'min_observed_latency'] = 'random',
          api_keys: Optional[Union[List[str], str]] = None,
          ssl: bool = False) -> None:

Import

from lmdeploy.serve.proxy.proxy import proxy

I/O Contract

Inputs

Name Type Required Description
server_name str No Host binding (default: '0.0.0.0')
server_port int No Port (default: 8000)
serving_strategy str No 'Hybrid' or 'DistServe' (prefill-decode disaggregation)
routing_strategy str No 'random', 'min_expected_latency', or 'min_observed_latency'

Outputs

Name Type Description
HTTP Proxy Running Process Load-balancing proxy on host:port routing to api_server nodes

Usage Examples

CLI Launch

# Start proxy with default settings
lmdeploy serve proxy --server-port 8000

# With latency-based routing
lmdeploy serve proxy \
    --server-port 8000 \
    --routing-strategy min_expected_latency

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment