Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mit han lab Llm awq InternVL Benchmark

From Leeroopedia
Knowledge Sources
Domains Benchmarking, Multimodal
Last Updated 2026-02-15 00:00 GMT

Overview

CLI benchmarking script for InternVL3 vision-language models, evaluating performance across four multimodal tasks (video captioning, video QA, image captioning, image QA) with optional LLM and vision tower quantization.

Description

This script provides a command-line interface for benchmarking InternVL3 models on four standard multimodal tasks. It measures inference performance including throughput and latency via the model's built-in benchmark method.

The main function begins by disabling PyTorch parameter initialization (kaiming_uniform_, kaiming_normal_, uniform_, normal_) and HuggingFace _init_weights to accelerate model loading. It sets PYTORCH_CUDA_ALLOC_CONF to "expandable_segments:True" for optimized GPU memory allocation.

The model is loaded as InternVL3 from tinychat.models, instantiated from an AutoConfig with resume_path set to the model checkpoint directory. The model is cast to half precision. When --quant_llm or --all is specified, the LLM backbone undergoes W4A16 quantization via real_quantize_model_weight (4-bit weights, group size 128, zero-point enabled) with init_only=True, followed by fused kernel replacements: make_quant_attn, make_quant_norm, and make_fused_mlp. When --quant_VT or --all is specified, the vision model encoder is wrapped with QuantInternVisionEncoder and optionally compiled with torch.compile.

Each benchmark task constructs a prompt combining media (Image or Video from llava.media) with a task-specific text query, resets the conversation template via clib.conv_templates, and calls model.benchmark(prompt, quant_llm) under torch.no_grad(). The four tasks are:

  • video_caption: "Elaborate on the visual and narrative elements of the video in detail."
  • video_QA: Multiple-choice question about video content.
  • image_caption: "Describe the image in detail."
  • image_QA: Multiple-choice question about image text content.

Usage

Run from the command line to benchmark InternVL3 with various quantization configurations:

# Benchmark all tasks with full quantization
python tinychat/internvl_benchmark.py \
    --model-path /path/to/internvl3 \
    --quant_path /path/to/quant.pt \
    --all

# Benchmark only image tasks without quantization
python tinychat/internvl_benchmark.py \
    --model-path /path/to/internvl3 \
    --image_caption --image_QA

# Benchmark with LLM quantization only
python tinychat/internvl_benchmark.py \
    --model-path /path/to/internvl3 \
    --quant_llm --all_task

Code Reference

Source Location

Signature

def main() -> None:

Import

# CLI script, run directly:
python tinychat/internvl_benchmark.py [OPTIONS]

I/O Contract

CLI Arguments

Argument Type Default Description
--model-path, -m str (required) Path to InternVL3 model checkpoint
--quant_path str /PATH/TO/QUANT Path to quantized weight file
--conv-mode, -c str auto Conversation template mode
--device str cuda:0 CUDA device
--act_scale_path str /PATH/TO/SCALE Path to activation scales
--quant_llm flag False Quantize the LLM backbone (W4A16)
--quant_VT flag False Quantize the vision tower
--video_caption flag False Run video captioning benchmark
--video_QA flag False Run video QA benchmark
--image_caption flag False Run image captioning benchmark
--image_QA flag False Run image QA benchmark
--all flag False Enable all quantization and all tasks
--all_task flag False Run all four benchmark tasks
--fakequant_VT flag False Use fake quantization for vision tower
--video_path str ../figures/nvila_demo_video.mp4 Path to benchmark video
--image_path str ../figures/vila-logo.jpg Path to benchmark image
--max_seq_len int 8192 Maximum sequence length

Output

Output Description
stdout Benchmark results printed per task with separator lines; includes model.benchmark() output (timing/throughput metrics)

Usage Examples

# Full benchmark with all quantization options
python tinychat/internvl_benchmark.py \
    --model-path /models/internvl3-8b \
    --quant_path /models/internvl3-8b-w4-g128-awq.pt \
    --all \
    --video_path /data/test_video.mp4 \
    --image_path /data/test_image.jpg

# Quick image-only benchmark without quantization
python tinychat/internvl_benchmark.py \
    --model-path /models/internvl3-8b \
    --image_caption \
    --image_QA

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment