Implementation:InternLM Lmdeploy Benchmark Serving
| Knowledge Sources | |
|---|---|
| Domains | Benchmarking, Orchestration, API |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
A benchmark orchestration script that automates the full lifecycle of server-based benchmarking: launching an API server, waiting for readiness, running client benchmarks, and cleaning up.
Description
The benchmark_serving.py script provides end-to-end automation for benchmarking lmdeploy (and other backends) by managing both the server and client sides. It reads a YAML configuration file that defines server settings, engine configurations, and data parameters.
Key functions:
get_launching_server_cmd: Constructs the appropriate server launch command for the given backend (lmdeploy, sglang, or vllm), translating config keys from snake_case to kebab-case CLI arguments.get_output_file: Generates a descriptive output filename encoding model name, backend, batch size, TP, DP, EP, and other parameters.get_server_ip_port: Determines the server IP and port from configuration, supporting proxy server setups for data-parallel deployments.wait_server_ready: Polls the/v1/modelsendpoint using the OpenAI client until the server responds, indicating readiness.get_client_cmd: Constructs the client benchmark command that invokesprofile_restful_api.py.benchmark: Orchestrates the full flow: start server process, wait for readiness, run one or more client benchmark configs, and terminate the server.validate_config: Validates the YAML configuration structure, ensuring required sections (api_server, engine, data) are present.main: Entry point that loads the YAML config, iterates over engine configurations, and runs benchmarks.
Multi-engine support: The config can specify multiple engine configurations as a list, and the script will sequentially benchmark each one.
Proxy server support: For data-parallel deployments, the script can route through a proxy server and send termination signals to all DP ranks.
The script is invoked via python-fire.
Usage
Used to automate benchmark runs by specifying a backend and a YAML config file. Manages server lifecycle automatically.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: benchmark/benchmark_serving.py
- Lines: 1-219
Signature
def get_launching_server_cmd(model_path, backend, server_config) -> List[str]: ...
def get_output_file(model_path, backend, server_config) -> str: ...
def get_server_ip_port(backend: str, server_config: Dict) -> Tuple[str, int]: ...
def wait_server_ready(server_ip: str, server_port: int) -> bool: ...
def get_client_cmd(backend: str, server_ip: str, server_port: int,
client_config: Dict) -> List[str]: ...
def benchmark(model_path: str, backend: str, server_config: Dict,
data_config: Dict | List[Dict]): ...
def validate_config(config: Dict) -> None: ...
def main(backend: str, config_path: str, model_path: Optional[str] = None): ...
Import
import fire
import yaml
# Standalone script, invoked via python-fire
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| backend | str | Yes | Backend: turbomind, pytorch, sglang, or vllm |
| config_path | str | Yes | Path to YAML configuration file |
| model_path | str | No | Optional override for model path (overrides config) |
The YAML config file must contain:
| Section | Type | Description |
|---|---|---|
| server | dict | Server settings (IP, port, proxy URL) |
| engine | dict or list | Engine configuration(s) with model_path, tp, dp, max_batch_size, etc. |
| data | dict or list | Client benchmark configuration(s) with dataset, num_prompts, request_rate, etc. |
Outputs
| Name | Type | Description |
|---|---|---|
| CSV file | file | Benchmark results with descriptive filename encoding configuration parameters |
| Console output | text | Server startup logs and benchmark progress |
Usage Examples
# Run benchmark with config file
# python benchmark/benchmark_serving.py \
# --backend turbomind \
# --config_path benchmark/config/example.yaml
# Example YAML config:
# server:
# server_ip: 0.0.0.0
# server_port: 23333
# engine:
# model_path: internlm/internlm2_5-7b-chat
# tp: 1
# max_batch_size: 256
# data:
# dataset_name: random
# random_input_len: 1024
# random_output_len: 512
# num_prompts: 1000
# request_rate: inf