Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:OpenGVLab InternVL Evaluate Sh

From Leeroopedia


Knowledge Sources
Domains Evaluation, Benchmarking
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for dispatching benchmark evaluations via a shell script provided by the InternVL evaluation framework.

Description

The evaluate.sh script is the master evaluation dispatcher for InternVL. It accepts a checkpoint path and dataset name, then routes to the appropriate Python evaluation script. It configures distributed inference via torchrun with configurable GPU count.

Usage

Run from the internvl_chat directory after training to evaluate model performance across benchmarks.

Code Reference

Source Location

  • Repository: InternVL
  • File: internvl_chat/evaluate.sh
  • Lines: L1-726

Signature

# Usage:
bash evaluate.sh <CHECKPOINT> <DATASET> [extra_args...]

# Environment variables:
GPUS=${GPUS:-8}              # Number of GPUs (default 8)
GPUS_PER_NODE=${GPUS_PER_NODE:-8}
MASTER_PORT=${MASTER_PORT:-63669}

# Special flags:
--auto                       # Single-GPU mode with auto device mapping

Import

cd internvl_chat
bash evaluate.sh ./output/checkpoint vqa-textvqa-val

I/O Contract

Inputs

Name Type Required Description
CHECKPOINT str (positional) Yes Path to model checkpoint
DATASET str (positional) Yes Benchmark name (e.g., 'vqa-textvqa-val', 'mantis', 'mmhal')
GPUS env var No Number of GPUs (default 8)
--auto flag No Enable single-GPU auto device mapping

Outputs

Name Type Description
Results file JSON/JSONL Per-sample predictions saved by the evaluation script
Metrics stdout Benchmark scores printed to console

Usage Examples

Evaluate on TextVQA

cd internvl_chat

# Multi-GPU evaluation (8 GPUs)
bash evaluate.sh ./output/finetune vqa-textvqa-val

# Single-GPU with auto device mapping
bash evaluate.sh ./output/finetune vqa-textvqa-val --auto

# Custom GPU count
GPUS=4 bash evaluate.sh ./output/finetune vqa-textvqa-val

Evaluate Multiple Benchmarks

CHECKPOINT="./output/finetune"

# VQA benchmarks
bash evaluate.sh $CHECKPOINT vqa-textvqa-val
bash evaluate.sh $CHECKPOINT vqa-docvqa-val
bash evaluate.sh $CHECKPOINT vqa-chartqa-test

# Multi-image
bash evaluate.sh $CHECKPOINT mantis
bash evaluate.sh $CHECKPOINT mmiu

# Hallucination
bash evaluate.sh $CHECKPOINT mmhal

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment