Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit OCR Reasoning Utils

From Leeroopedia
Revision as of 13:31, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Open_compass_VLMEvalKit_OCR_Reasoning_Utils.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Field Value
source VLMEvalKit
domain Vision, Evaluation, OCR, Reasoning

Overview

Provides GPT-based answer extraction and judge-based scoring for the OCR Reasoning benchmark supporting bilingual (Chinese/English) evaluation.

Description

This module implements `get_gpt4_ICE` with five bilingual in-context examples for answer extraction from OCR-related model responses. The `build_ocrr_gpt4_prompt` function constructs extraction prompts. It also defines `judge_prompts` for a rating-based evaluation system (1-10 scale) where a GPT judge compares model answers against reference answers, providing correctness scores in the format "Rating: N".

Usage

Called internally by the corresponding dataset class during evaluation.

Code Reference

  • Source: vlmeval/dataset/utils/ocr_reasoning.py, Lines: L1-169
  • Import: from vlmeval.dataset.utils.ocr_reasoning import build_ocrr_gpt4_prompt, get_gpt4_ICE

Key Functions:

def get_gpt4_ICE(): ...
def build_ocrr_gpt4_prompt(line): ...
judge_prompts = """..."""

I/O Contract

Direction Description
Inputs A data line dict with 'question' and 'prediction' fields for answer extraction
Outputs Formatted GPT-4 prompt string for extraction; rating-based judge evaluation prompt

Usage Examples

from vlmeval.dataset.utils.ocr_reasoning import build_ocrr_gpt4_prompt

prompt = build_ocrr_gpt4_prompt(line)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment