Principle:OpenGVLab InternVL VQA Accuracy Scoring

Knowledge Sources	VQA Challenge InternVL
Domains	Evaluation, Vision_Language, Metrics
Last Updated	2026-02-07 00:00 GMT

Overview

A family of evaluation metrics for visual question answering that measure model prediction accuracy using soft scoring, exact matching, and edit-distance-based methods.

Description

VQA evaluation uses multiple scoring approaches depending on the benchmark:

VQA soft accuracy: The standard VQA metric where each question has 10 human-provided ground truth answers. Accuracy for a prediction is min(1, count_of_matching_answers / 3), reflecting inter-annotator agreement.
Exact match: Binary scoring — the prediction either matches any ground truth answer exactly, or it does not.
Relaxed accuracy: Allows 5% relative numerical tolerance for math/chart questions.
ANLS (Average Normalized Levenshtein Similarity): Edit-distance-based scoring for OCR-heavy benchmarks (InfographicsVQA, DocVQA), with a threshold of 0.5.

Answer normalization is critical: predictions and ground truths are lowercased, punctuation-stripped, article-removed, and number-word-converted before comparison.

Usage

Use VQA soft accuracy for standard VQA benchmarks (TextVQA, VQAv2, OKVQA). Use ANLS for document understanding benchmarks (InfographicsVQA, DocVQA). Use relaxed accuracy for chart/math benchmarks (ChartQA).

Theoretical Basis

VQA soft accuracy (per question): $accuracy = \min (1, \frac{| {a \in G T : a = \hat{a}} |}{3})$

Where $G T$ is the set of 10 ground truth answers and $\hat{a}$ is the model prediction.

ANLS (per question): $ANLS (s_{1}, s_{2}) = {\begin{cases} 1 - NL (s_{1}, s_{2}) & if NL (s_{1}, s_{2}) < 0.5 \\ 0 & otherwise \end{cases}$

Where $NL (s_{1}, s_{2}) = \frac{LevenshteinDistance (s_{1}, s_{2})}{\max (| s_{1} |, | s_{2} |)}$ .

Related Pages

Implemented By

Implementation:OpenGVLab_InternVL_TextVQAAccuracyEvaluator

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment