Implementation:Mlfoundations Open flamingo Evaluate vqa
Appearance
Overview
Concrete tool for running few-shot VQA evaluation with official VQA accuracy scoring across four benchmarks provided by the OpenFlamingo evaluation module.
Description
The evaluate_vqa() function:
- Loads train and test VQADataset splits
- Selects few-shot examples (random or RICES)
- Constructs prompts with
"<image>Question:{q} Short answer:{a}<|endofchunk|>"format - Generates answers via
eval_model.get_outputs()with beam search (max 5 tokens) - Post-processes answers (extract text before "Question"/"Answer" tokens)
- Gathers predictions across ranks
- Computes VQA accuracy via
compute_vqa_accuracy()
Handles dataset-specific paths and formats for VQAv2, OK-VQA, VizWiz, and TextVQA.
Usage
Called from the main evaluation loop for VQA benchmarks.
Code Reference
Source
- Repository: https://github.com/mlfoundations/open_flamingo
- File:
open_flamingo/eval/evaluate.pyLines L899-1115 (evaluate_vqa) - File:
open_flamingo/eval/vqa_metric.pyLines L527-560 (compute_vqa_accuracy)
Signature
def evaluate_vqa(
args: argparse.Namespace,
eval_model: BaseEvalModel,
seed: int = 42,
min_generation_length: int = 0,
max_generation_length: int = 5,
num_beams: int = 3,
length_penalty: float = 0.0,
num_shots: int = 8,
dataset_name: str = "vqav2",
cached_features=None,
) -> float:
"""Returns VQA accuracy score"""
def compute_vqa_accuracy(
result_json_path: str,
question_json_path: str,
annotation_json_path: str,
) -> float:
"""Returns overall VQA accuracy"""
Import
from open_flamingo.eval.evaluate import evaluate_vqa from open_flamingo.eval.vqa_metric import compute_vqa_accuracy
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args | argparse.Namespace |
Yes | Eval config with dataset paths |
| eval_model | BaseEvalModel |
Yes | Model wrapper |
| seed | int |
No | Random seed (default 42) |
| num_shots | int |
No | Number of few-shot examples (default 8) |
| dataset_name | str |
No | One of "vqav2", "ok_vqa", "vizwiz", "textvqa" |
| max_generation_length | int |
No | Maximum tokens to generate (default 5) |
| cached_features | Tensor |
No | RICES features for retrieval-based example selection |
Outputs
| Type | Description |
|---|---|
float |
VQA accuracy score |
Usage Examples
# Run 8-shot VQA evaluation on VQAv2
accuracy = evaluate_vqa(
args=args,
eval_model=eval_model,
seed=42,
num_shots=8,
dataset_name="vqav2",
)
print(f"VQAv2 8-shot accuracy: {accuracy:.4f}")
Related Pages
Principle:Mlfoundations_Open_flamingo_Visual_Question_Answering_Evaluation
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment