Implementation:OpenGVLab InternVL TextVQAAccuracyEvaluator
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for computing VQA soft accuracy scores provided by the InternVL evaluation framework.
Description
The TextVQAAccuracyEvaluator class computes VQA soft accuracy by:
- Normalizing predictions and ground truth answers using EvalAIAnswerProcessor
- Computing the soft accuracy score per question (min(1, matching/3) with 10 GT answers)
- Averaging across all questions
The companion STVQAANLSEvaluator uses Levenshtein edit distance for ANLS scoring.
Usage
Used by evaluate_vqa.py to compute benchmark scores after distributed inference is complete.
Code Reference
Source Location
- Repository: InternVL
- File: internvl_chat/eval/vqa/textvqa_eval.py
- Lines: L222-258
Signature
class TextVQAAccuracyEvaluator:
def __init__(self):
self.answer_processor = EvalAIAnswerProcessor()
def _compute_answer_scores(self, raw_answers):
"""Compute soft accuracy scores from 10 ground truth answers."""
def eval_pred_list(self, pred_list, disable_tqdm=False):
"""
Evaluate a list of predictions against ground truth.
Args:
pred_list: List[Dict] with keys 'pred_answer' and 'gt_answers'
disable_tqdm: bool - Disable progress bar
Returns:
float - Average VQA accuracy across all predictions
"""
Import
from eval.vqa.textvqa_eval import TextVQAAccuracyEvaluator
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pred_list | List[Dict] | Yes | List of dicts with 'pred_answer' (str) and 'gt_answers' (List[str], 10 answers) |
Outputs
| Name | Type | Description |
|---|---|---|
| accuracy | float | Average VQA soft accuracy (0.0 to 1.0) |
Usage Examples
Compute VQA Accuracy
from eval.vqa.textvqa_eval import TextVQAAccuracyEvaluator
evaluator = TextVQAAccuracyEvaluator()
pred_list = [
{
'pred_answer': 'golden gate bridge',
'gt_answers': ['golden gate bridge'] * 8 + ['golden gate', 'bridge']
},
{
'pred_answer': 'cat',
'gt_answers': ['cat'] * 5 + ['kitten'] * 3 + ['feline', 'kitty']
},
]
accuracy = evaluator.eval_pred_list(pred_list)
print(f'VQA Accuracy: {accuracy:.4f}')
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment