Implementation:Mit han lab Llm awq InternVL Benchmark

Knowledge Sources	Mit_han_lab_Llm_awq
Domains	Benchmarking, Multimodal
Last Updated	2026-02-15 00:00 GMT

Overview

CLI benchmarking script for InternVL3 vision-language models, evaluating performance across four multimodal tasks (video captioning, video QA, image captioning, image QA) with optional LLM and vision tower quantization.

Description

This script provides a command-line interface for benchmarking InternVL3 models on four standard multimodal tasks. It measures inference performance including throughput and latency via the model's built-in benchmark method.

The main function begins by disabling PyTorch parameter initialization (kaiming_uniform_, kaiming_normal_, uniform_, normal_) and HuggingFace _init_weights to accelerate model loading. It sets PYTORCH_CUDA_ALLOC_CONF to "expandable_segments:True" for optimized GPU memory allocation.

The model is loaded as InternVL3 from tinychat.models, instantiated from an AutoConfig with resume_path set to the model checkpoint directory. The model is cast to half precision. When --quant_llm or --all is specified, the LLM backbone undergoes W4A16 quantization via real_quantize_model_weight (4-bit weights, group size 128, zero-point enabled) with init_only=True, followed by fused kernel replacements: make_quant_attn, make_quant_norm, and make_fused_mlp. When --quant_VT or --all is specified, the vision model encoder is wrapped with QuantInternVisionEncoder and optionally compiled with torch.compile.

Each benchmark task constructs a prompt combining media (Image or Video from llava.media) with a task-specific text query, resets the conversation template via clib.conv_templates, and calls model.benchmark(prompt, quant_llm) under torch.no_grad(). The four tasks are:

video_caption: "Elaborate on the visual and narrative elements of the video in detail."
video_QA: Multiple-choice question about video content.
image_caption: "Describe the image in detail."
image_QA: Multiple-choice question about image text content.

Usage

Run from the command line to benchmark InternVL3 with various quantization configurations:

# Benchmark all tasks with full quantization
python tinychat/internvl_benchmark.py \
    --model-path /path/to/internvl3 \
    --quant_path /path/to/quant.pt \
    --all

# Benchmark only image tasks without quantization
python tinychat/internvl_benchmark.py \
    --model-path /path/to/internvl3 \
    --image_caption --image_QA

# Benchmark with LLM quantization only
python tinychat/internvl_benchmark.py \
    --model-path /path/to/internvl3 \
    --quant_llm --all_task

Code Reference

Source Location

Repository: Mit_han_lab_Llm_awq
File: tinychat/internvl_benchmark.py
Lines: 1-167

Signature

def main() -> None:

Import

# CLI script, run directly:
python tinychat/internvl_benchmark.py [OPTIONS]

I/O Contract

CLI Arguments

Argument	Type	Default	Description
--model-path, -m	str	(required)	Path to InternVL3 model checkpoint
--quant_path	str	/PATH/TO/QUANT	Path to quantized weight file
--conv-mode, -c	str	auto	Conversation template mode
--device	str	cuda:0	CUDA device
--act_scale_path	str	/PATH/TO/SCALE	Path to activation scales
--quant_llm	flag	False	Quantize the LLM backbone (W4A16)
--quant_VT	flag	False	Quantize the vision tower
--video_caption	flag	False	Run video captioning benchmark
--video_QA	flag	False	Run video QA benchmark
--image_caption	flag	False	Run image captioning benchmark
--image_QA	flag	False	Run image QA benchmark
--all	flag	False	Enable all quantization and all tasks
--all_task	flag	False	Run all four benchmark tasks
--fakequant_VT	flag	False	Use fake quantization for vision tower
--video_path	str	../figures/nvila_demo_video.mp4	Path to benchmark video
--image_path	str	../figures/vila-logo.jpg	Path to benchmark image
--max_seq_len	int	8192	Maximum sequence length

Output

Output	Description
stdout	Benchmark results printed per task with separator lines; includes model.benchmark() output (timing/throughput metrics)

Usage Examples

# Full benchmark with all quantization options
python tinychat/internvl_benchmark.py \
    --model-path /models/internvl3-8b \
    --quant_path /models/internvl3-8b-w4-g128-awq.pt \
    --all \
    --video_path /data/test_video.mp4 \
    --image_path /data/test_image.jpg

# Quick image-only benchmark without quantization
python tinychat/internvl_benchmark.py \
    --model-path /models/internvl3-8b \
    --image_caption \
    --image_QA

Related Pages

Principle:Mit_han_lab_Llm_awq_VLM_Benchmarking

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment