Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:OpenGVLab InternVL TextVQAAccuracyEvaluator

From Leeroopedia


Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for computing VQA soft accuracy scores provided by the InternVL evaluation framework.

Description

The TextVQAAccuracyEvaluator class computes VQA soft accuracy by:

  1. Normalizing predictions and ground truth answers using EvalAIAnswerProcessor
  2. Computing the soft accuracy score per question (min(1, matching/3) with 10 GT answers)
  3. Averaging across all questions

The companion STVQAANLSEvaluator uses Levenshtein edit distance for ANLS scoring.

Usage

Used by evaluate_vqa.py to compute benchmark scores after distributed inference is complete.

Code Reference

Source Location

  • Repository: InternVL
  • File: internvl_chat/eval/vqa/textvqa_eval.py
  • Lines: L222-258

Signature

class TextVQAAccuracyEvaluator:
    def __init__(self):
        self.answer_processor = EvalAIAnswerProcessor()

    def _compute_answer_scores(self, raw_answers):
        """Compute soft accuracy scores from 10 ground truth answers."""

    def eval_pred_list(self, pred_list, disable_tqdm=False):
        """
        Evaluate a list of predictions against ground truth.

        Args:
            pred_list: List[Dict] with keys 'pred_answer' and 'gt_answers'
            disable_tqdm: bool - Disable progress bar

        Returns:
            float - Average VQA accuracy across all predictions
        """

Import

from eval.vqa.textvqa_eval import TextVQAAccuracyEvaluator

I/O Contract

Inputs

Name Type Required Description
pred_list List[Dict] Yes List of dicts with 'pred_answer' (str) and 'gt_answers' (List[str], 10 answers)

Outputs

Name Type Description
accuracy float Average VQA soft accuracy (0.0 to 1.0)

Usage Examples

Compute VQA Accuracy

from eval.vqa.textvqa_eval import TextVQAAccuracyEvaluator

evaluator = TextVQAAccuracyEvaluator()

pred_list = [
    {
        'pred_answer': 'golden gate bridge',
        'gt_answers': ['golden gate bridge'] * 8 + ['golden gate', 'bridge']
    },
    {
        'pred_answer': 'cat',
        'gt_answers': ['cat'] * 5 + ['kitten'] * 3 + ['feline', 'kitty']
    },
]

accuracy = evaluator.eval_pred_list(pred_list)
print(f'VQA Accuracy: {accuracy:.4f}')

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment