Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Eval Chat Config

From Leeroopedia


Knowledge Sources
Domains Evaluation, Configuration, Benchmarking
Last Updated 2026-02-07 15:00 GMT

Overview

An OpenCompass evaluation configuration file that defines model configurations for benchmarking chat models across multiple LLM families (InternLM, Llama, Qwen, Gemma, etc.) using both TurboMind and PyTorch backends with various quantization settings.

Description

The eval_chat_config.py file is a Python-based configuration consumed by the OpenCompass evaluation framework. It defines:

  • Dataset imports: Imports evaluation datasets including BBH, C-Eval, CMMLU, CrowS-Pairs, GaokaoBench, GPQA, GSM8K, HellaSwag, HumanEval, IFEval, MATH, MBPP, MMLU, MMLU-Pro, NQ, RACE, TheoremQA, TriviaQA, and Winogrande.
  • Model configurations: Creates deepcopy-based configurations for each model variant:
    • TurboMind backend: base, 4-bit AWQ quantization, KV-cache INT4, KV-cache INT8
    • PyTorch backend: base, W8A8 quantization
    • Models covered include InternLM2/2.5/3, Qwen 1.5/2/2.5/3 (including MoE variants up to 235B), Llama 2/3/3.1, Gemma 2, Baichuan 2, and Mixtral
  • Configuration updates: Programmatic loops that set backend types, quantization formats, batch sizes, tensor parallelism, and abbreviation strings based on naming conventions (e.g., _4bits, _kvint4, _w8a8).
  • Summarizer: A comprehensive summarizer configuration listing all dataset abbreviations and summary groups for result aggregation.

Key constants: MAX_SESSION_LEN = 2048, MAX_NEW_TOKENS = 1024.

Usage

This configuration is consumed by the evaluate function in action_tools.py, which copies it, appends dataset and model selections, and passes it to the opencompass CLI.

Code Reference

Source Location

Signature

# Configuration variables (not functions)
MAX_SESSION_LEN = 2048
MAX_NEW_TOKENS = 1024

# Example model config pattern:
turbomind_internlm2_chat_7b = deepcopy(*lmdeploy_internlm2_chat_7b)
pytorch_internlm2_chat_7b = deepcopy(*lmdeploy_internlm2_chat_7b)

# Base model for Qwen3 family:
base_model = dict(
    type=TurboMindModelwithChatTemplate,
    engine_config=dict(session_len=32768, max_batch_size=256),
    gen_config=dict(top_k=1, temperature=1e-6, top_p=0.9, max_new_tokens=32768),
    ...
)

# Summarizer configuration
summarizer = dict(dataset_abbrs=[...], summary_groups=[...])

Import

from copy import deepcopy
from opencompass.models import TurboMindModelwithChatTemplate
from opencompass.utils.text_postprocessors import extract_non_reasoning_content

I/O Contract

Inputs

Name Type Required Description
OpenCompass base configs Python modules Yes Imported dataset and model configs from opencompass.configs
Model path conventions str Yes Model paths following HuggingFace naming (e.g., "Qwen/Qwen3-32B")

Outputs

Name Type Description
Model config dicts dict Per-model configuration dictionaries consumed by OpenCompass
summarizer dict Summarizer configuration for aggregating evaluation results
datasets list Appended at runtime by action_tools.py evaluate function

Usage Examples

# This file is used indirectly via action_tools.py:
# python .github/scripts/action_tools.py evaluate \
#     --models '["turbomind_internlm2_chat_7b"]' \
#     --datasets '["mmlu_datasets"]' \
#     --workspace /tmp/eval \
#     --evaluate_type chat

# Or directly with opencompass:
# opencompass .github/scripts/eval_chat_config.py -w /tmp/work_dir

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment