Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Action Tools

From Leeroopedia


Knowledge Sources
Domains CI, Evaluation, Benchmarking
Last Updated 2026-02-07 15:00 GMT

Overview

A collection of utility functions used by GitHub Actions workflows to run shell commands, evaluate models with OpenCompass, generate benchmark reports, and publish results to GitHub Step Summaries.

Description

The action_tools.py module provides helper functions that support the lmdeploy CI/CD pipeline. It includes:

  • run_cmd: Executes shell commands with logging, capturing stdout/stderr to a file and returning the exit code.
  • add_summary / _append_summary: Converts CSV data into Markdown table format and appends it to the GitHub Step Summary file (GITHUB_STEP_SUMMARY).
  • evaluate: Orchestrates model evaluation using OpenCompass. It iterates over a list of models, copies and configures the evaluation config file, runs opencompass, and parses the resulting CSV summary into a consolidated output.
  • create_model_links: Creates symbolic links for model directories, used to set up model paths in the test environment.
  • generate_benchmark_report: Traverses benchmark result directories, merges CSV files using pandas, computes averages per group, and writes the results to GitHub Step Summary.
  • generate_csv_from_profile_result: Converts JSONL profiling output into CSV format with columns for request rate, completed requests, RPM, median TTFT, and output throughput.
  • generate_output_for_evaluation: Finds the latest summary CSV, transposes it, sorts it, and outputs the result to GitHub Step Summary.

The module is invoked via python-fire, allowing any public function to be called directly from the command line.

Usage

Used internally by GitHub Actions workflows to evaluate models, generate benchmark reports, and create step summaries. Typically invoked as:

python .github/scripts/action_tools.py evaluate --models '["model1"]' --datasets '["dataset1"]' --workspace /tmp/eval --evaluate_type chat
python .github/scripts/action_tools.py generate_benchmark_report --report_path /path/to/reports

Code Reference

Source Location

Signature

def run_cmd(cmd_lines: List[str], log_path: str, cwd: str = None) -> int: ...

def add_summary(csv_path: str) -> None: ...

def evaluate(models: List[str], datasets: List[str], workspace: str,
             evaluate_type: str, max_num_workers: int = 8,
             is_smoke: bool = False) -> None: ...

def create_model_links(src_dir: str, dst_dir: str) -> None: ...

def generate_benchmark_report(report_path: str) -> None: ...

def generate_csv_from_profile_result(file_path: str, out_path: str) -> None: ...

def generate_output_for_evaluation(result_dir: str) -> None: ...

Import

# Not typically imported; invoked via python-fire CLI
import fire
fire.Fire()

I/O Contract

Inputs

Name Type Required Description
models List[str] Yes (evaluate) List of model names to evaluate
datasets List[str] Yes (evaluate) List of dataset identifiers for evaluation
workspace str Yes (evaluate) Working directory for evaluation outputs
evaluate_type str Yes (evaluate) Type of evaluation config (e.g., "chat")
report_path str Yes (generate_benchmark_report) Path to benchmark report directory tree
csv_path str Yes (add_summary) Path to CSV file to add to step summary
cmd_lines List[str] Yes (run_cmd) Shell command split across lines
log_path str Yes (run_cmd) Path to log output file

Outputs

Name Type Description
return_code int Shell command exit code (run_cmd)
CSV files file Evaluation and benchmark results written to disk
GitHub Step Summary side effect Markdown tables appended to GITHUB_STEP_SUMMARY

Usage Examples

# Run a shell command and log output
from action_tools import run_cmd
ret = run_cmd(['python3', '-m', 'pytest', 'tests/'], log_path='/tmp/test.log')

# Evaluate models with OpenCompass
from action_tools import evaluate
evaluate(
    models=['turbomind_internlm2_chat_7b'],
    datasets=['mmlu_datasets'],
    workspace='/tmp/eval_workspace',
    evaluate_type='chat'
)

# Generate benchmark report from result directories
from action_tools import generate_benchmark_report
generate_benchmark_report('/path/to/benchmark/results')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment