Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval sglang qwen3vl script

From Leeroopedia
Revision as of 12:32, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/EvolvingLMMs_Lab_Lmms_eval_sglang_qwen3vl_script.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Model Evaluation, SGLang Backend, Vision Language Models
Last Updated 2026-02-14 00:00 GMT

Overview

Bash script for evaluating Qwen3-VL models using SGLang backend with optional MCP tool calling support.

Description

This is a comprehensive example script that demonstrates how to evaluate Qwen3-VL vision-language models using the SGLang inference backend. The script supports both basic evaluation and advanced tool-enabled evaluation through the Model Context Protocol (MCP). It provides detailed configuration options for parallelization, memory management, and task selection. The script includes extensive documentation on parameter usage, tool calling workflows, and best practices.

Usage

Use this script when you need to evaluate Qwen3-VL models (30B or 235B variants) with accelerated inference via SGLang. It is particularly useful for multi-GPU setups requiring tensor parallelism and scenarios where tool calling capabilities are needed through MCP servers.

Code Reference

Source Location

Key Configuration Variables

MODEL="Qwen/Qwen3-VL-30B-A3B-Instruct"
TENSOR_PARALLEL_SIZE=4
GPU_MEMORY_UTILIZATION=0.85
BATCH_SIZE=64
MAX_PIXELS=1605632
MIN_PIXELS=784
MAX_FRAME_NUM=32
THREADS=16
TASKS="mmmu_val,mme"
OUTPUT_PATH="./logs/qwen3vl_sglang"

Main Command Structure

uv run python -m lmms_eval \
    --model sglang \
    --model_args model=${MODEL},tensor_parallel_size=${TENSOR_PARALLEL_SIZE},gpu_memory_utilization=${GPU_MEMORY_UTILIZATION},max_pixels=${MAX_PIXELS},min_pixels=${MIN_PIXELS},max_frame_num=${MAX_FRAME_NUM},threads=${THREADS} \
    --tasks ${TASKS} \
    --batch_size ${BATCH_SIZE} \
    --output_path ${OUTPUT_PATH}

I/O Contract

Inputs

Name Type Required Description
MODEL String Yes Qwen3-VL model identifier from HuggingFace
TENSOR_PARALLEL_SIZE Integer No Number of GPUs for tensor parallelism (default: 1)
GPU_MEMORY_UTILIZATION Float No GPU memory fraction 0.0-1.0 (default: 0.85)
BATCH_SIZE Integer No Batch size for evaluation (default: 64)
MAX_PIXELS Integer No Maximum image resolution pixels (default: 1605632)
MIN_PIXELS Integer No Minimum image resolution pixels (default: 784)
MAX_FRAME_NUM Integer No Maximum video frames (default: 32)
THREADS Integer No Thread count for visual processing (default: 16)
TASKS String Yes Comma-separated task names (e.g., "mmmu_val,mme")
OUTPUT_PATH String Yes Directory path for output logs
MCP_SERVER_PATH String No Path to MCP server script for tool calling
WORK_DIR String No Working directory for MCP tools (default: /tmp/...)
MAX_TURN Integer No Maximum tool calling turns (default: 5)

Outputs

Name Type Description
Evaluation logs JSON files Task evaluation results saved to OUTPUT_PATH
Sample logs JSON files Individual sample predictions (if --log_samples enabled)
Console output Text Progress and metrics printed to stdout

Usage Examples

Basic Evaluation (Without Tools)

#!/bin/bash
MODEL="Qwen/Qwen3-VL-30B-A3B-Instruct"
TENSOR_PARALLEL_SIZE=4
TASKS="mmmu_val,mme"
BATCH_SIZE=64
OUTPUT_PATH="./logs/qwen3vl_sglang"

uv run python -m lmms_eval \
    --model sglang \
    --model_args model=${MODEL},tensor_parallel_size=${TENSOR_PARALLEL_SIZE},gpu_memory_utilization=0.85,max_pixels=1605632,min_pixels=784,max_frame_num=32,threads=16 \
    --tasks ${TASKS} \
    --batch_size ${BATCH_SIZE} \
    --output_path ${OUTPUT_PATH}

Tool-Enabled Evaluation (With MCP)

#!/bin/bash
MODEL="Qwen/Qwen3-VL-30B-A3B-Instruct"
MCP_SERVER_PATH="/path/to/mcp_server.py"
WORK_DIR="/tmp/sglang_mcp_work"

uv run python -m lmms_eval \
    --model sglang \
    --model_args model=${MODEL},tensor_parallel_size=4,gpu_memory_utilization=0.85,max_pixels=1605632,min_pixels=784,max_frame_num=32,threads=16,mcp_server_path=${MCP_SERVER_PATH},work_dir=${WORK_DIR},max_turn=5 \
    --tasks mmmu_val \
    --batch_size 1 \
    --output_path ./logs/qwen3vl_with_mcp

Tool Calling Workflow

The script documents the MCP tool calling loop:

1. User sends request with question
2. SGLang processes message and generates text
3. Function call parser detects tool calls (finish_reason == "tool_calls")
4. If tool calls detected:
   a. Parse tool call function name and arguments
   b. Retrieve tool definition from MCPClient
   c. Execute tool via MCPClient.run_tool(tool_name, arguments)
   d. Convert tool result to OpenAI-compatible format
   e. Append tool result as {"role": "tool", ...}
   f. Generate next response with updated context
5. Continue until final text or max_turn reached

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment