Implementation:OpenGVLab InternVL Evaluate Sh
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Benchmarking |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for dispatching benchmark evaluations via a shell script provided by the InternVL evaluation framework.
Description
The evaluate.sh script is the master evaluation dispatcher for InternVL. It accepts a checkpoint path and dataset name, then routes to the appropriate Python evaluation script. It configures distributed inference via torchrun with configurable GPU count.
Usage
Run from the internvl_chat directory after training to evaluate model performance across benchmarks.
Code Reference
Source Location
- Repository: InternVL
- File: internvl_chat/evaluate.sh
- Lines: L1-726
Signature
# Usage:
bash evaluate.sh <CHECKPOINT> <DATASET> [extra_args...]
# Environment variables:
GPUS=${GPUS:-8} # Number of GPUs (default 8)
GPUS_PER_NODE=${GPUS_PER_NODE:-8}
MASTER_PORT=${MASTER_PORT:-63669}
# Special flags:
--auto # Single-GPU mode with auto device mapping
Import
cd internvl_chat
bash evaluate.sh ./output/checkpoint vqa-textvqa-val
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| CHECKPOINT | str (positional) | Yes | Path to model checkpoint |
| DATASET | str (positional) | Yes | Benchmark name (e.g., 'vqa-textvqa-val', 'mantis', 'mmhal') |
| GPUS | env var | No | Number of GPUs (default 8) |
| --auto | flag | No | Enable single-GPU auto device mapping |
Outputs
| Name | Type | Description |
|---|---|---|
| Results file | JSON/JSONL | Per-sample predictions saved by the evaluation script |
| Metrics | stdout | Benchmark scores printed to console |
Usage Examples
Evaluate on TextVQA
cd internvl_chat
# Multi-GPU evaluation (8 GPUs)
bash evaluate.sh ./output/finetune vqa-textvqa-val
# Single-GPU with auto device mapping
bash evaluate.sh ./output/finetune vqa-textvqa-val --auto
# Custom GPU count
GPUS=4 bash evaluate.sh ./output/finetune vqa-textvqa-val
Evaluate Multiple Benchmarks
CHECKPOINT="./output/finetune"
# VQA benchmarks
bash evaluate.sh $CHECKPOINT vqa-textvqa-val
bash evaluate.sh $CHECKPOINT vqa-docvqa-val
bash evaluate.sh $CHECKPOINT vqa-chartqa-test
# Multi-image
bash evaluate.sh $CHECKPOINT mantis
bash evaluate.sh $CHECKPOINT mmiu
# Hallucination
bash evaluate.sh $CHECKPOINT mmhal
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment