Implementation:InternLM Lmdeploy Action Tools
| Knowledge Sources | |
|---|---|
| Domains | CI, Evaluation, Benchmarking |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
A collection of utility functions used by GitHub Actions workflows to run shell commands, evaluate models with OpenCompass, generate benchmark reports, and publish results to GitHub Step Summaries.
Description
The action_tools.py module provides helper functions that support the lmdeploy CI/CD pipeline. It includes:
- run_cmd: Executes shell commands with logging, capturing stdout/stderr to a file and returning the exit code.
- add_summary / _append_summary: Converts CSV data into Markdown table format and appends it to the GitHub Step Summary file (
GITHUB_STEP_SUMMARY). - evaluate: Orchestrates model evaluation using OpenCompass. It iterates over a list of models, copies and configures the evaluation config file, runs
opencompass, and parses the resulting CSV summary into a consolidated output. - create_model_links: Creates symbolic links for model directories, used to set up model paths in the test environment.
- generate_benchmark_report: Traverses benchmark result directories, merges CSV files using pandas, computes averages per group, and writes the results to GitHub Step Summary.
- generate_csv_from_profile_result: Converts JSONL profiling output into CSV format with columns for request rate, completed requests, RPM, median TTFT, and output throughput.
- generate_output_for_evaluation: Finds the latest summary CSV, transposes it, sorts it, and outputs the result to GitHub Step Summary.
The module is invoked via python-fire, allowing any public function to be called directly from the command line.
Usage
Used internally by GitHub Actions workflows to evaluate models, generate benchmark reports, and create step summaries. Typically invoked as:
python .github/scripts/action_tools.py evaluate --models '["model1"]' --datasets '["dataset1"]' --workspace /tmp/eval --evaluate_type chat
python .github/scripts/action_tools.py generate_benchmark_report --report_path /path/to/reports
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: .github/scripts/action_tools.py
- Lines: 1-305
Signature
def run_cmd(cmd_lines: List[str], log_path: str, cwd: str = None) -> int: ...
def add_summary(csv_path: str) -> None: ...
def evaluate(models: List[str], datasets: List[str], workspace: str,
evaluate_type: str, max_num_workers: int = 8,
is_smoke: bool = False) -> None: ...
def create_model_links(src_dir: str, dst_dir: str) -> None: ...
def generate_benchmark_report(report_path: str) -> None: ...
def generate_csv_from_profile_result(file_path: str, out_path: str) -> None: ...
def generate_output_for_evaluation(result_dir: str) -> None: ...
Import
# Not typically imported; invoked via python-fire CLI
import fire
fire.Fire()
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| models | List[str] | Yes (evaluate) | List of model names to evaluate |
| datasets | List[str] | Yes (evaluate) | List of dataset identifiers for evaluation |
| workspace | str | Yes (evaluate) | Working directory for evaluation outputs |
| evaluate_type | str | Yes (evaluate) | Type of evaluation config (e.g., "chat") |
| report_path | str | Yes (generate_benchmark_report) | Path to benchmark report directory tree |
| csv_path | str | Yes (add_summary) | Path to CSV file to add to step summary |
| cmd_lines | List[str] | Yes (run_cmd) | Shell command split across lines |
| log_path | str | Yes (run_cmd) | Path to log output file |
Outputs
| Name | Type | Description |
|---|---|---|
| return_code | int | Shell command exit code (run_cmd) |
| CSV files | file | Evaluation and benchmark results written to disk |
| GitHub Step Summary | side effect | Markdown tables appended to GITHUB_STEP_SUMMARY |
Usage Examples
# Run a shell command and log output
from action_tools import run_cmd
ret = run_cmd(['python3', '-m', 'pytest', 'tests/'], log_path='/tmp/test.log')
# Evaluate models with OpenCompass
from action_tools import evaluate
evaluate(
models=['turbomind_internlm2_chat_7b'],
datasets=['mmlu_datasets'],
workspace='/tmp/eval_workspace',
evaluate_type='chat'
)
# Generate benchmark report from result directories
from action_tools import generate_benchmark_report
generate_benchmark_report('/path/to/benchmark/results')