Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:OpenGVLab InternVL VQA Accuracy Scoring

From Leeroopedia


Knowledge Sources
Domains Evaluation, Vision_Language, Metrics
Last Updated 2026-02-07 00:00 GMT

Overview

A family of evaluation metrics for visual question answering that measure model prediction accuracy using soft scoring, exact matching, and edit-distance-based methods.

Description

VQA evaluation uses multiple scoring approaches depending on the benchmark:

  • VQA soft accuracy: The standard VQA metric where each question has 10 human-provided ground truth answers. Accuracy for a prediction is min(1, count_of_matching_answers / 3), reflecting inter-annotator agreement.
  • Exact match: Binary scoring — the prediction either matches any ground truth answer exactly, or it does not.
  • Relaxed accuracy: Allows 5% relative numerical tolerance for math/chart questions.
  • ANLS (Average Normalized Levenshtein Similarity): Edit-distance-based scoring for OCR-heavy benchmarks (InfographicsVQA, DocVQA), with a threshold of 0.5.

Answer normalization is critical: predictions and ground truths are lowercased, punctuation-stripped, article-removed, and number-word-converted before comparison.

Usage

Use VQA soft accuracy for standard VQA benchmarks (TextVQA, VQAv2, OKVQA). Use ANLS for document understanding benchmarks (InfographicsVQA, DocVQA). Use relaxed accuracy for chart/math benchmarks (ChartQA).

Theoretical Basis

VQA soft accuracy (per question): accuracy=min(1,|{aGT:a=a^}|3)

Where GT is the set of 10 ground truth answers and a^ is the model prediction.

ANLS (per question): ANLS(s1,s2)={1NL(s1,s2)if NL(s1,s2)<0.50otherwise

Where NL(s1,s2)=LevenshteinDistance(s1,s2)max(|s1|,|s2|).

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment