Implementation:InternLM Lmdeploy Benchmark Serving

Knowledge Sources	InternLM_Lmdeploy
Domains	Benchmarking, Orchestration, API
Last Updated	2026-02-07 15:00 GMT

Overview

A benchmark orchestration script that automates the full lifecycle of server-based benchmarking: launching an API server, waiting for readiness, running client benchmarks, and cleaning up.

Description

The benchmark_serving.py script provides end-to-end automation for benchmarking lmdeploy (and other backends) by managing both the server and client sides. It reads a YAML configuration file that defines server settings, engine configurations, and data parameters.

Key functions:

get_launching_server_cmd: Constructs the appropriate server launch command for the given backend (lmdeploy, sglang, or vllm), translating config keys from snake_case to kebab-case CLI arguments.
get_output_file: Generates a descriptive output filename encoding model name, backend, batch size, TP, DP, EP, and other parameters.
get_server_ip_port: Determines the server IP and port from configuration, supporting proxy server setups for data-parallel deployments.
wait_server_ready: Polls the /v1/models endpoint using the OpenAI client until the server responds, indicating readiness.
get_client_cmd: Constructs the client benchmark command that invokes profile_restful_api.py.
benchmark: Orchestrates the full flow: start server process, wait for readiness, run one or more client benchmark configs, and terminate the server.
validate_config: Validates the YAML configuration structure, ensuring required sections (api_server, engine, data) are present.
main: Entry point that loads the YAML config, iterates over engine configurations, and runs benchmarks.

Multi-engine support: The config can specify multiple engine configurations as a list, and the script will sequentially benchmark each one.

Proxy server support: For data-parallel deployments, the script can route through a proxy server and send termination signals to all DP ranks.

The script is invoked via python-fire.

Usage

Used to automate benchmark runs by specifying a backend and a YAML config file. Manages server lifecycle automatically.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: benchmark/benchmark_serving.py
Lines: 1-219

Signature

def get_launching_server_cmd(model_path, backend, server_config) -> List[str]: ...

def get_output_file(model_path, backend, server_config) -> str: ...

def get_server_ip_port(backend: str, server_config: Dict) -> Tuple[str, int]: ...

def wait_server_ready(server_ip: str, server_port: int) -> bool: ...

def get_client_cmd(backend: str, server_ip: str, server_port: int,
                   client_config: Dict) -> List[str]: ...

def benchmark(model_path: str, backend: str, server_config: Dict,
              data_config: Dict | List[Dict]): ...

def validate_config(config: Dict) -> None: ...

def main(backend: str, config_path: str, model_path: Optional[str] = None): ...

Import

import fire
import yaml
# Standalone script, invoked via python-fire

I/O Contract

Inputs

Name	Type	Required	Description
backend	str	Yes	Backend: turbomind, pytorch, sglang, or vllm
config_path	str	Yes	Path to YAML configuration file
model_path	str	No	Optional override for model path (overrides config)

The YAML config file must contain:

Section	Type	Description
server	dict	Server settings (IP, port, proxy URL)
engine	dict or list	Engine configuration(s) with model_path, tp, dp, max_batch_size, etc.
data	dict or list	Client benchmark configuration(s) with dataset, num_prompts, request_rate, etc.

Outputs

Name	Type	Description
CSV file	file	Benchmark results with descriptive filename encoding configuration parameters
Console output	text	Server startup logs and benchmark progress

Usage Examples

# Run benchmark with config file
# python benchmark/benchmark_serving.py \
#     --backend turbomind \
#     --config_path benchmark/config/example.yaml

# Example YAML config:
# server:
#   server_ip: 0.0.0.0
#   server_port: 23333
# engine:
#   model_path: internlm/internlm2_5-7b-chat
#   tp: 1
#   max_batch_size: 256
# data:
#   dataset_name: random
#   random_input_len: 1024
#   random_output_len: 512
#   num_prompts: 1000
#   request_rate: inf

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment