Implementation:EvolvingLMMs Lab Lmms eval sglang qwen3vl script

Knowledge Sources	EvolvingLMMs_Lab_Lmms_eval
Domains	Model Evaluation, SGLang Backend, Vision Language Models
Last Updated	2026-02-14 00:00 GMT

Overview

Bash script for evaluating Qwen3-VL models using SGLang backend with optional MCP tool calling support.

Description

This is a comprehensive example script that demonstrates how to evaluate Qwen3-VL vision-language models using the SGLang inference backend. The script supports both basic evaluation and advanced tool-enabled evaluation through the Model Context Protocol (MCP). It provides detailed configuration options for parallelization, memory management, and task selection. The script includes extensive documentation on parameter usage, tool calling workflows, and best practices.

Usage

Use this script when you need to evaluate Qwen3-VL models (30B or 235B variants) with accelerated inference via SGLang. It is particularly useful for multi-GPU setups requiring tensor parallelism and scenarios where tool calling capabilities are needed through MCP servers.

Code Reference

Source Location

Repository: EvolvingLMMs_Lab_Lmms_eval
File: examples/models/sglang_qwen3vl.sh
Lines: 1-220

Key Configuration Variables

MODEL="Qwen/Qwen3-VL-30B-A3B-Instruct"
TENSOR_PARALLEL_SIZE=4
GPU_MEMORY_UTILIZATION=0.85
BATCH_SIZE=64
MAX_PIXELS=1605632
MIN_PIXELS=784
MAX_FRAME_NUM=32
THREADS=16
TASKS="mmmu_val,mme"
OUTPUT_PATH="./logs/qwen3vl_sglang"

Main Command Structure

uv run python -m lmms_eval \
    --model sglang \
    --model_args model=${MODEL},tensor_parallel_size=${TENSOR_PARALLEL_SIZE},gpu_memory_utilization=${GPU_MEMORY_UTILIZATION},max_pixels=${MAX_PIXELS},min_pixels=${MIN_PIXELS},max_frame_num=${MAX_FRAME_NUM},threads=${THREADS} \
    --tasks ${TASKS} \
    --batch_size ${BATCH_SIZE} \
    --output_path ${OUTPUT_PATH}

I/O Contract

Inputs

Name	Type	Required	Description
MODEL	String	Yes	Qwen3-VL model identifier from HuggingFace
TENSOR_PARALLEL_SIZE	Integer	No	Number of GPUs for tensor parallelism (default: 1)
GPU_MEMORY_UTILIZATION	Float	No	GPU memory fraction 0.0-1.0 (default: 0.85)
BATCH_SIZE	Integer	No	Batch size for evaluation (default: 64)
MAX_PIXELS	Integer	No	Maximum image resolution pixels (default: 1605632)
MIN_PIXELS	Integer	No	Minimum image resolution pixels (default: 784)
MAX_FRAME_NUM	Integer	No	Maximum video frames (default: 32)
THREADS	Integer	No	Thread count for visual processing (default: 16)
TASKS	String	Yes	Comma-separated task names (e.g., "mmmu_val,mme")
OUTPUT_PATH	String	Yes	Directory path for output logs
MCP_SERVER_PATH	String	No	Path to MCP server script for tool calling
WORK_DIR	String	No	Working directory for MCP tools (default: /tmp/...)
MAX_TURN	Integer	No	Maximum tool calling turns (default: 5)

Outputs

Name	Type	Description
Evaluation logs	JSON files	Task evaluation results saved to OUTPUT_PATH
Sample logs	JSON files	Individual sample predictions (if --log_samples enabled)
Console output	Text	Progress and metrics printed to stdout

Usage Examples

Basic Evaluation (Without Tools)

#!/bin/bash
MODEL="Qwen/Qwen3-VL-30B-A3B-Instruct"
TENSOR_PARALLEL_SIZE=4
TASKS="mmmu_val,mme"
BATCH_SIZE=64
OUTPUT_PATH="./logs/qwen3vl_sglang"

uv run python -m lmms_eval \
    --model sglang \
    --model_args model=${MODEL},tensor_parallel_size=${TENSOR_PARALLEL_SIZE},gpu_memory_utilization=0.85,max_pixels=1605632,min_pixels=784,max_frame_num=32,threads=16 \
    --tasks ${TASKS} \
    --batch_size ${BATCH_SIZE} \
    --output_path ${OUTPUT_PATH}

Tool-Enabled Evaluation (With MCP)

#!/bin/bash
MODEL="Qwen/Qwen3-VL-30B-A3B-Instruct"
MCP_SERVER_PATH="/path/to/mcp_server.py"
WORK_DIR="/tmp/sglang_mcp_work"

uv run python -m lmms_eval \
    --model sglang \
    --model_args model=${MODEL},tensor_parallel_size=4,gpu_memory_utilization=0.85,max_pixels=1605632,min_pixels=784,max_frame_num=32,threads=16,mcp_server_path=${MCP_SERVER_PATH},work_dir=${WORK_DIR},max_turn=5 \
    --tasks mmmu_val \
    --batch_size 1 \
    --output_path ./logs/qwen3vl_with_mcp

Tool Calling Workflow

The script documents the MCP tool calling loop:

1. User sends request with question
2. SGLang processes message and generates text
3. Function call parser detects tool calls (finish_reason == "tool_calls")
4. If tool calls detected:
   a. Parse tool call function name and arguments
   b. Retrieve tool definition from MCPClient
   c. Execute tool via MCPClient.run_tool(tool_name, arguments)
   d. Convert tool result to OpenAI-compatible format
   e. Append tool result as {"role": "tool", ...}
   f. Generate next response with updated context
5. Continue until final text or max_turn reached

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment