Implementation:OpenGVLab InternVL POPE Benchmark Evaluation

Knowledge Sources	OpenGVLab_InternVL
Domains	Evaluation, Benchmark, Hallucination_Detection
Last Updated	2026-02-07 14:00 GMT

Overview

This script evaluates model predictions on the POPE (Polling-based Object Probing Evaluation) benchmark by computing accuracy, precision, recall, F1 score, and yes-ratio metrics for object hallucination detection.

Description

The eval_pope.py script implements the evaluation logic for the POPE benchmark, which measures object hallucination in vision-language models. The evaluation works as follows:

Answer normalization: Model free-text responses are parsed into binary yes/no predictions by checking for the presence of keywords ("No", "not", "no") in the first sentence
Label loading: Ground truth labels ("yes"/"no") are loaded from annotation files in the specified directory
Per-category evaluation: The script iterates over annotation files matching the pattern coco_pope_*.json, filtering predictions by category (e.g., "random", "popular", "adversarial")
Metric computation: For each category, it computes TP, FP, TN, FN counts and derives accuracy, precision, recall, F1 score, and yes-ratio

The yes-ratio metric is particularly important for POPE as it reveals model bias toward answering "yes" regardless of the question, a common hallucination pattern.

Usage

Use this script to evaluate LLaVA model outputs on the POPE benchmark after generating predictions with a VQA inference script. It provides per-category hallucination metrics across random, popular, and adversarial sampling strategies.

Code Reference

Source Location

Repository: OpenGVLab_InternVL
File: internvl_chat_llava/llava/eval/eval_pope.py
Lines: 1-81

Signature

def eval_pope(answers: list, label_file: str) -> None: ...

Import

# This is a standalone CLI script, not typically imported
# Run via: python eval_pope.py --annotation-dir /path/to/pope --question-file questions.jsonl --result-file results.jsonl

I/O Contract

Inputs

Name	Type	Required	Description
--annotation-dir	str (dir path)	Yes	Directory containing POPE annotation files (coco_pope_*.json)
--question-file	str (file path)	Yes	Path to JSONL file with questions (containing question_id and category)
--result-file	str (file path)	Yes	Path to JSONL file with model predictions (containing question_id and text)

Outputs

Name	Type	Description
stdout	text	Per-category TP/FP/TN/FN counts, accuracy, precision, recall, F1, and yes-ratio

Usage Examples

Basic Usage

# Command-line execution for POPE evaluation
# python internvl_chat_llava/llava/eval/eval_pope.py \
#     --annotation-dir /path/to/coco_pope_annotations \
#     --question-file pope_questions.jsonl \
#     --result-file model_predictions.jsonl

Related Pages

Principle:OpenGVLab_InternVL_VQA_Accuracy_Scoring

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment