Implementation:Mit han lab Llm awq InternVL Benchmark
| Knowledge Sources | |
|---|---|
| Domains | Benchmarking, Multimodal |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
CLI benchmarking script for InternVL3 vision-language models, evaluating performance across four multimodal tasks (video captioning, video QA, image captioning, image QA) with optional LLM and vision tower quantization.
Description
This script provides a command-line interface for benchmarking InternVL3 models on four standard multimodal tasks. It measures inference performance including throughput and latency via the model's built-in benchmark method.
The main function begins by disabling PyTorch parameter initialization (kaiming_uniform_, kaiming_normal_, uniform_, normal_) and HuggingFace _init_weights to accelerate model loading. It sets PYTORCH_CUDA_ALLOC_CONF to "expandable_segments:True" for optimized GPU memory allocation.
The model is loaded as InternVL3 from tinychat.models, instantiated from an AutoConfig with resume_path set to the model checkpoint directory. The model is cast to half precision. When --quant_llm or --all is specified, the LLM backbone undergoes W4A16 quantization via real_quantize_model_weight (4-bit weights, group size 128, zero-point enabled) with init_only=True, followed by fused kernel replacements: make_quant_attn, make_quant_norm, and make_fused_mlp. When --quant_VT or --all is specified, the vision model encoder is wrapped with QuantInternVisionEncoder and optionally compiled with torch.compile.
Each benchmark task constructs a prompt combining media (Image or Video from llava.media) with a task-specific text query, resets the conversation template via clib.conv_templates, and calls model.benchmark(prompt, quant_llm) under torch.no_grad(). The four tasks are:
- video_caption: "Elaborate on the visual and narrative elements of the video in detail."
- video_QA: Multiple-choice question about video content.
- image_caption: "Describe the image in detail."
- image_QA: Multiple-choice question about image text content.
Usage
Run from the command line to benchmark InternVL3 with various quantization configurations:
# Benchmark all tasks with full quantization
python tinychat/internvl_benchmark.py \
--model-path /path/to/internvl3 \
--quant_path /path/to/quant.pt \
--all
# Benchmark only image tasks without quantization
python tinychat/internvl_benchmark.py \
--model-path /path/to/internvl3 \
--image_caption --image_QA
# Benchmark with LLM quantization only
python tinychat/internvl_benchmark.py \
--model-path /path/to/internvl3 \
--quant_llm --all_task
Code Reference
Source Location
- Repository: Mit_han_lab_Llm_awq
- File: tinychat/internvl_benchmark.py
- Lines: 1-167
Signature
def main() -> None:
Import
# CLI script, run directly:
python tinychat/internvl_benchmark.py [OPTIONS]
I/O Contract
CLI Arguments
| Argument | Type | Default | Description |
|---|---|---|---|
| --model-path, -m | str | (required) | Path to InternVL3 model checkpoint |
| --quant_path | str | /PATH/TO/QUANT | Path to quantized weight file |
| --conv-mode, -c | str | auto | Conversation template mode |
| --device | str | cuda:0 | CUDA device |
| --act_scale_path | str | /PATH/TO/SCALE | Path to activation scales |
| --quant_llm | flag | False | Quantize the LLM backbone (W4A16) |
| --quant_VT | flag | False | Quantize the vision tower |
| --video_caption | flag | False | Run video captioning benchmark |
| --video_QA | flag | False | Run video QA benchmark |
| --image_caption | flag | False | Run image captioning benchmark |
| --image_QA | flag | False | Run image QA benchmark |
| --all | flag | False | Enable all quantization and all tasks |
| --all_task | flag | False | Run all four benchmark tasks |
| --fakequant_VT | flag | False | Use fake quantization for vision tower |
| --video_path | str | ../figures/nvila_demo_video.mp4 | Path to benchmark video |
| --image_path | str | ../figures/vila-logo.jpg | Path to benchmark image |
| --max_seq_len | int | 8192 | Maximum sequence length |
Output
| Output | Description |
|---|---|
| stdout | Benchmark results printed per task with separator lines; includes model.benchmark() output (timing/throughput metrics) |
Usage Examples
# Full benchmark with all quantization options
python tinychat/internvl_benchmark.py \
--model-path /models/internvl3-8b \
--quant_path /models/internvl3-8b-w4-g128-awq.pt \
--all \
--video_path /data/test_video.mp4 \
--image_path /data/test_image.jpg
# Quick image-only benchmark without quantization
python tinychat/internvl_benchmark.py \
--model-path /models/internvl3-8b \
--image_caption \
--image_QA