Implementation:OpenGVLab InternVL Evaluate Sh

Knowledge Sources	InternVL
Domains	Evaluation, Benchmarking
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for dispatching benchmark evaluations via a shell script provided by the InternVL evaluation framework.

Description

The evaluate.sh script is the master evaluation dispatcher for InternVL. It accepts a checkpoint path and dataset name, then routes to the appropriate Python evaluation script. It configures distributed inference via torchrun with configurable GPU count.

Usage

Run from the internvl_chat directory after training to evaluate model performance across benchmarks.

Code Reference

Source Location

Repository: InternVL
File: internvl_chat/evaluate.sh
Lines: L1-726

Signature

# Usage:
bash evaluate.sh <CHECKPOINT> <DATASET> [extra_args...]

# Environment variables:
GPUS=${GPUS:-8}              # Number of GPUs (default 8)
GPUS_PER_NODE=${GPUS_PER_NODE:-8}
MASTER_PORT=${MASTER_PORT:-63669}

# Special flags:
--auto                       # Single-GPU mode with auto device mapping

Import

cd internvl_chat
bash evaluate.sh ./output/checkpoint vqa-textvqa-val

I/O Contract

Inputs

Name	Type	Required	Description
CHECKPOINT	str (positional)	Yes	Path to model checkpoint
DATASET	str (positional)	Yes	Benchmark name (e.g., 'vqa-textvqa-val', 'mantis', 'mmhal')
GPUS	env var	No	Number of GPUs (default 8)
--auto	flag	No	Enable single-GPU auto device mapping

Outputs

Name	Type	Description
Results file	JSON/JSONL	Per-sample predictions saved by the evaluation script
Metrics	stdout	Benchmark scores printed to console

Usage Examples

Evaluate on TextVQA

cd internvl_chat

# Multi-GPU evaluation (8 GPUs)
bash evaluate.sh ./output/finetune vqa-textvqa-val

# Single-GPU with auto device mapping
bash evaluate.sh ./output/finetune vqa-textvqa-val --auto

# Custom GPU count
GPUS=4 bash evaluate.sh ./output/finetune vqa-textvqa-val

Evaluate Multiple Benchmarks

CHECKPOINT="./output/finetune"

# VQA benchmarks
bash evaluate.sh $CHECKPOINT vqa-textvqa-val
bash evaluate.sh $CHECKPOINT vqa-docvqa-val
bash evaluate.sh $CHECKPOINT vqa-chartqa-test

# Multi-image
bash evaluate.sh $CHECKPOINT mantis
bash evaluate.sh $CHECKPOINT mmiu

# Hallucination
bash evaluate.sh $CHECKPOINT mmhal

Related Pages

Implements Principle

Principle:OpenGVLab_InternVL_Benchmark_Dispatch

Requires Environment

Environment:OpenGVLab_InternVL_PyTorch_CUDA

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment