Implementation:Open compass VLMEvalKit Omni Verifier

Field	Value
source	VLMEvalKit
domain	Vision, Evaluation, VQA, LLM Judge

Overview

Provides a GPT-based semantic verification system for evaluating model responses against ground truth answers in visual question-answering tasks.

Description

This module implements an evaluation template (`EVAL_TMPL`) that instructs a GPT judge to determine semantic equivalence between model responses and ground truth answers. The verification considers a response correct if it conveys the same meaning (even with different phrasing) or provides additional relevant details, and incorrect if it contradicts the ground truth or includes incorrect information. The `_process_digit_article` function normalizes text by converting word-form numbers to digits and removing articles.

Usage

Called internally by the corresponding dataset class during evaluation.

Code Reference

Source: vlmeval/dataset/utils/omni_verifier.py, Lines: L1-220
Import: from vlmeval.dataset.utils.omni_verifier import EVAL_TMPL, _process_digit_article

Key Functions:

EVAL_TMPL = """..."""
def _process_digit_article(inText): ...

I/O Contract

Direction	Description
Inputs	Model response string and ground truth answer string for semantic comparison
Outputs	"yes" or "no" string indicating semantic correctness

Usage Examples

from vlmeval.dataset.utils.omni_verifier import EVAL_TMPL

prompt = EVAL_TMPL.format(response="A red car", ground_truth="A red vehicle")

Related Pages

Principle:Open_compass_VLMEvalKit_Benchmark_Dataset_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment