Implementation:EvolvingLMMs Lab Lmms eval sglang qwen3vl script
| Knowledge Sources | |
|---|---|
| Domains | Model Evaluation, SGLang Backend, Vision Language Models |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Bash script for evaluating Qwen3-VL models using SGLang backend with optional MCP tool calling support.
Description
This is a comprehensive example script that demonstrates how to evaluate Qwen3-VL vision-language models using the SGLang inference backend. The script supports both basic evaluation and advanced tool-enabled evaluation through the Model Context Protocol (MCP). It provides detailed configuration options for parallelization, memory management, and task selection. The script includes extensive documentation on parameter usage, tool calling workflows, and best practices.
Usage
Use this script when you need to evaluate Qwen3-VL models (30B or 235B variants) with accelerated inference via SGLang. It is particularly useful for multi-GPU setups requiring tensor parallelism and scenarios where tool calling capabilities are needed through MCP servers.
Code Reference
Source Location
- Repository: EvolvingLMMs_Lab_Lmms_eval
- File: examples/models/sglang_qwen3vl.sh
- Lines: 1-220
Key Configuration Variables
MODEL="Qwen/Qwen3-VL-30B-A3B-Instruct"
TENSOR_PARALLEL_SIZE=4
GPU_MEMORY_UTILIZATION=0.85
BATCH_SIZE=64
MAX_PIXELS=1605632
MIN_PIXELS=784
MAX_FRAME_NUM=32
THREADS=16
TASKS="mmmu_val,mme"
OUTPUT_PATH="./logs/qwen3vl_sglang"
Main Command Structure
uv run python -m lmms_eval \
--model sglang \
--model_args model=${MODEL},tensor_parallel_size=${TENSOR_PARALLEL_SIZE},gpu_memory_utilization=${GPU_MEMORY_UTILIZATION},max_pixels=${MAX_PIXELS},min_pixels=${MIN_PIXELS},max_frame_num=${MAX_FRAME_NUM},threads=${THREADS} \
--tasks ${TASKS} \
--batch_size ${BATCH_SIZE} \
--output_path ${OUTPUT_PATH}
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| MODEL | String | Yes | Qwen3-VL model identifier from HuggingFace |
| TENSOR_PARALLEL_SIZE | Integer | No | Number of GPUs for tensor parallelism (default: 1) |
| GPU_MEMORY_UTILIZATION | Float | No | GPU memory fraction 0.0-1.0 (default: 0.85) |
| BATCH_SIZE | Integer | No | Batch size for evaluation (default: 64) |
| MAX_PIXELS | Integer | No | Maximum image resolution pixels (default: 1605632) |
| MIN_PIXELS | Integer | No | Minimum image resolution pixels (default: 784) |
| MAX_FRAME_NUM | Integer | No | Maximum video frames (default: 32) |
| THREADS | Integer | No | Thread count for visual processing (default: 16) |
| TASKS | String | Yes | Comma-separated task names (e.g., "mmmu_val,mme") |
| OUTPUT_PATH | String | Yes | Directory path for output logs |
| MCP_SERVER_PATH | String | No | Path to MCP server script for tool calling |
| WORK_DIR | String | No | Working directory for MCP tools (default: /tmp/...) |
| MAX_TURN | Integer | No | Maximum tool calling turns (default: 5) |
Outputs
| Name | Type | Description |
|---|---|---|
| Evaluation logs | JSON files | Task evaluation results saved to OUTPUT_PATH |
| Sample logs | JSON files | Individual sample predictions (if --log_samples enabled) |
| Console output | Text | Progress and metrics printed to stdout |
Usage Examples
Basic Evaluation (Without Tools)
#!/bin/bash
MODEL="Qwen/Qwen3-VL-30B-A3B-Instruct"
TENSOR_PARALLEL_SIZE=4
TASKS="mmmu_val,mme"
BATCH_SIZE=64
OUTPUT_PATH="./logs/qwen3vl_sglang"
uv run python -m lmms_eval \
--model sglang \
--model_args model=${MODEL},tensor_parallel_size=${TENSOR_PARALLEL_SIZE},gpu_memory_utilization=0.85,max_pixels=1605632,min_pixels=784,max_frame_num=32,threads=16 \
--tasks ${TASKS} \
--batch_size ${BATCH_SIZE} \
--output_path ${OUTPUT_PATH}
Tool-Enabled Evaluation (With MCP)
#!/bin/bash
MODEL="Qwen/Qwen3-VL-30B-A3B-Instruct"
MCP_SERVER_PATH="/path/to/mcp_server.py"
WORK_DIR="/tmp/sglang_mcp_work"
uv run python -m lmms_eval \
--model sglang \
--model_args model=${MODEL},tensor_parallel_size=4,gpu_memory_utilization=0.85,max_pixels=1605632,min_pixels=784,max_frame_num=32,threads=16,mcp_server_path=${MCP_SERVER_PATH},work_dir=${WORK_DIR},max_turn=5 \
--tasks mmmu_val \
--batch_size 1 \
--output_path ./logs/qwen3vl_with_mcp
Tool Calling Workflow
The script documents the MCP tool calling loop:
1. User sends request with question
2. SGLang processes message and generates text
3. Function call parser detects tool calls (finish_reason == "tool_calls")
4. If tool calls detected:
a. Parse tool call function name and arguments
b. Retrieve tool definition from MCPClient
c. Execute tool via MCPClient.run_tool(tool_name, arguments)
d. Convert tool result to OpenAI-compatible format
e. Append tool result as {"role": "tool", ...}
f. Generate next response with updated context
5. Continue until final text or max_turn reached