Implementation:Open compass VLMEvalKit OCR Reasoning Utils

Field	Value
source	VLMEvalKit
domain	Vision, Evaluation, OCR, Reasoning

Overview

Provides GPT-based answer extraction and judge-based scoring for the OCR Reasoning benchmark supporting bilingual (Chinese/English) evaluation.

Description

This module implements `get_gpt4_ICE` with five bilingual in-context examples for answer extraction from OCR-related model responses. The `build_ocrr_gpt4_prompt` function constructs extraction prompts. It also defines `judge_prompts` for a rating-based evaluation system (1-10 scale) where a GPT judge compares model answers against reference answers, providing correctness scores in the format "Rating: N".

Usage

Called internally by the corresponding dataset class during evaluation.

Code Reference

Source: vlmeval/dataset/utils/ocr_reasoning.py, Lines: L1-169
Import: from vlmeval.dataset.utils.ocr_reasoning import build_ocrr_gpt4_prompt, get_gpt4_ICE

Key Functions:

def get_gpt4_ICE(): ...
def build_ocrr_gpt4_prompt(line): ...
judge_prompts = """..."""

I/O Contract

Direction	Description
Inputs	A data line dict with 'question' and 'prediction' fields for answer extraction
Outputs	Formatted GPT-4 prompt string for extraction; rating-based judge evaluation prompt

Usage Examples

from vlmeval.dataset.utils.ocr_reasoning import build_ocrr_gpt4_prompt

prompt = build_ocrr_gpt4_prompt(line)

Related Pages

Principle:Open_compass_VLMEvalKit_Benchmark_Dataset_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment