Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mlfoundations Open flamingo Evaluate vqa

From Leeroopedia


Template:Metadata

Overview

Concrete tool for running few-shot VQA evaluation with official VQA accuracy scoring across four benchmarks provided by the OpenFlamingo evaluation module.

Description

The evaluate_vqa() function:

  1. Loads train and test VQADataset splits
  2. Selects few-shot examples (random or RICES)
  3. Constructs prompts with "<image>Question:{q} Short answer:{a}<|endofchunk|>" format
  4. Generates answers via eval_model.get_outputs() with beam search (max 5 tokens)
  5. Post-processes answers (extract text before "Question"/"Answer" tokens)
  6. Gathers predictions across ranks
  7. Computes VQA accuracy via compute_vqa_accuracy()

Handles dataset-specific paths and formats for VQAv2, OK-VQA, VizWiz, and TextVQA.

Usage

Called from the main evaluation loop for VQA benchmarks.

Code Reference

Source

Signature

def evaluate_vqa(
    args: argparse.Namespace,
    eval_model: BaseEvalModel,
    seed: int = 42,
    min_generation_length: int = 0,
    max_generation_length: int = 5,
    num_beams: int = 3,
    length_penalty: float = 0.0,
    num_shots: int = 8,
    dataset_name: str = "vqav2",
    cached_features=None,
) -> float:
    """Returns VQA accuracy score"""

def compute_vqa_accuracy(
    result_json_path: str,
    question_json_path: str,
    annotation_json_path: str,
) -> float:
    """Returns overall VQA accuracy"""

Import

from open_flamingo.eval.evaluate import evaluate_vqa
from open_flamingo.eval.vqa_metric import compute_vqa_accuracy

I/O Contract

Inputs

Name Type Required Description
args argparse.Namespace Yes Eval config with dataset paths
eval_model BaseEvalModel Yes Model wrapper
seed int No Random seed (default 42)
num_shots int No Number of few-shot examples (default 8)
dataset_name str No One of "vqav2", "ok_vqa", "vizwiz", "textvqa"
max_generation_length int No Maximum tokens to generate (default 5)
cached_features Tensor No RICES features for retrieval-based example selection

Outputs

Type Description
float VQA accuracy score

Usage Examples

# Run 8-shot VQA evaluation on VQAv2
accuracy = evaluate_vqa(
    args=args,
    eval_model=eval_model,
    seed=42,
    num_shots=8,
    dataset_name="vqav2",
)
print(f"VQAv2 8-shot accuracy: {accuracy:.4f}")

Related Pages

Principle:Mlfoundations_Open_flamingo_Visual_Question_Answering_Evaluation

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment