Implementation:Open compass VLMEvalKit Omni Verifier
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Evaluation, VQA, LLM Judge |
Overview
Provides a GPT-based semantic verification system for evaluating model responses against ground truth answers in visual question-answering tasks.
Description
This module implements an evaluation template (`EVAL_TMPL`) that instructs a GPT judge to determine semantic equivalence between model responses and ground truth answers. The verification considers a response correct if it conveys the same meaning (even with different phrasing) or provides additional relevant details, and incorrect if it contradicts the ground truth or includes incorrect information. The `_process_digit_article` function normalizes text by converting word-form numbers to digits and removing articles.
Usage
Called internally by the corresponding dataset class during evaluation.
Code Reference
- Source:
vlmeval/dataset/utils/omni_verifier.py, Lines: L1-220 - Import:
from vlmeval.dataset.utils.omni_verifier import EVAL_TMPL, _process_digit_article
Key Functions:
EVAL_TMPL = """..."""
def _process_digit_article(inText): ...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | Model response string and ground truth answer string for semantic comparison |
| Outputs | "yes" or "no" string indicating semantic correctness |
Usage Examples
from vlmeval.dataset.utils.omni_verifier import EVAL_TMPL
prompt = EVAL_TMPL.format(response="A red car", ground_truth="A red vehicle")