Implementation:Open compass VLMEvalKit OCR Reasoning Utils
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Evaluation, OCR, Reasoning |
Overview
Provides GPT-based answer extraction and judge-based scoring for the OCR Reasoning benchmark supporting bilingual (Chinese/English) evaluation.
Description
This module implements `get_gpt4_ICE` with five bilingual in-context examples for answer extraction from OCR-related model responses. The `build_ocrr_gpt4_prompt` function constructs extraction prompts. It also defines `judge_prompts` for a rating-based evaluation system (1-10 scale) where a GPT judge compares model answers against reference answers, providing correctness scores in the format "Rating: N".
Usage
Called internally by the corresponding dataset class during evaluation.
Code Reference
- Source:
vlmeval/dataset/utils/ocr_reasoning.py, Lines: L1-169 - Import:
from vlmeval.dataset.utils.ocr_reasoning import build_ocrr_gpt4_prompt, get_gpt4_ICE
Key Functions:
def get_gpt4_ICE(): ...
def build_ocrr_gpt4_prompt(line): ...
judge_prompts = """..."""
I/O Contract
| Direction | Description |
|---|---|
| Inputs | A data line dict with 'question' and 'prediction' fields for answer extraction |
| Outputs | Formatted GPT-4 prompt string for extraction; rating-based judge evaluation prompt |
Usage Examples
from vlmeval.dataset.utils.ocr_reasoning import build_ocrr_gpt4_prompt
prompt = build_ocrr_gpt4_prompt(line)