Implementation:InternLM Lmdeploy Autotest Config

Knowledge Sources	InternLM_Lmdeploy
Domains	Testing, Configuration, CI
Last Updated	2026-02-07 15:00 GMT

Overview

A YAML configuration file that defines the complete test matrix for lmdeploy automated testing, specifying model paths, tensor parallelism settings, backend-specific model lists, quantization exclusions, and benchmark/evaluation model selections.

Description

The autotest/config.yml file serves as the central configuration for the lmdeploy autotest framework. It organizes the test matrix across multiple dimensions:

Global paths: Defines paths for models (/nvme/qa_test_models), resources, logs, server logs, evaluation reports, benchmark reports, and datasets (ShareGPT).
Tensor parallelism (TP) config: Maps model identifiers to their required TP values (e.g., Qwen3-235B-A22B requires TP=8, InternVL3-38B requires TP=2).
TurboMind chat models: Lists all models tested with the TurboMind backend, including Llama, InternLM, InternVL, Qwen, Mistral, DeepSeek, CodeLlama, GLM, MiniCPM, and others.
PyTorch chat models: Lists all models tested with the PyTorch backend, including additional models like Gemma, Phi, and deepseek-moe variants.
VL (Vision-Language) models: Separate lists for TurboMind and PyTorch backends covering multimodal models (InternVL, Qwen-VL, LLaVA, CogVLM, MiniCPM-V, Phi-3-vision).
Base models: Lists for completion-only models without chat templates.
Quantization config: Specifies exclusion lists for quantization tests:
- no_awq: Models that cannot be AWQ-quantized
- gptq: Models tested with GPTQ
- no_kvint4/no_kvint8: Models excluded from KV-cache quantization
Benchmark models: Models selected for performance benchmarking.
Evaluate models: Models selected for accuracy evaluation with OpenCompass.
MLLM evaluate models: Multimodal models selected for vision-language evaluation.

The configuration targets the A100 environment (env_tag: a100).

Usage

Consumed by the pytest-based autotest framework to dynamically generate test cases based on model-backend-quantization combinations.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: autotest/config.yml
Lines: 1-435

Signature

model_path: /nvme/qa_test_models
resource_path: /nvme/qa_test_models/resource
log_path: /nvme/qa_test_models/autotest_log
env_tag: a100
device: cuda

config:
    tp:
        meta-llama/Meta-Llama-3-1-70B-Instruct: 4
        internlm/Intern-S1: 8
        Qwen/Qwen3-235B-A22B: 8
        # ...

turbomind_chat_model:
    tp:
        - meta-llama/Meta-Llama-3-1-8B-Instruct
        - internlm/internlm3-8b-instruct
        # ...

pytorch_chat_model:
    tp:
        - meta-llama/Llama-4-Scout-17B-16E-Instruct
        # ...

turbomind_quantization:
    no_awq: [...]
    gptq: [...]
    no_kvint4: [...]

I/O Contract

Inputs

Name	Type	Required	Description
YAML config file	file	Yes	The config.yml file itself, loaded by the autotest framework

Outputs

Name	Type	Description
Test matrix	dict	Parsed configuration consumed by pytest fixtures to generate test cases
Model lists	lists	Per-backend lists of models to test
TP mappings	dict	Tensor parallelism requirements per model
Quantization exclusions	dict	Models to skip for specific quantization methods

Usage Examples

import yaml

with open('autotest/config.yml', 'r') as f:
    config = yaml.safe_load(f)

# Get all turbomind chat models
turbomind_models = config['turbomind_chat_model']['tp']

# Get TP setting for a specific model
tp = config['config']['tp'].get('Qwen/Qwen3-235B-A22B', 1)

# Get models excluded from AWQ quantization
no_awq = config['turbomind_quantization']['no_awq']

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment