Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit Build Judge

From Leeroopedia
Revision as of 13:28, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Open_compass_VLMEvalKit_Build_Judge.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Field Value
Source VLMEvalKit
Domain Vision, Evaluation, NLP

Overview

Concrete tool for constructing LLM judge model instances from shorthand names provided by VLMEvalKit.

Description

build_judge() in vlmeval/dataset/utils/judge_util.py resolves judge shorthand names to specific model versions via a hardcoded mapping (e.g., "chatgpt-0125" to "gpt-3.5-turbo-0125", "gpt-4o" to "gpt-4o-2024-05-13"). It then wraps them in the appropriate client class: OpenAIWrapper for GPT models, SiliconFlowAPI for Qwen/DeepSeek, HFChatModel for local LLMs. Calls load_env() to ensure API keys are available. Supports the LOCAL_LLM environment variable override to use a local model instead.

Usage

Called internally by dataset.evaluate() methods. Can also be called directly when building a custom evaluation pipeline.

Code Reference

  • Source: vlmeval/dataset/utils/judge_util.py, Lines: L7-40
  • Signature:
def build_judge(**kwargs) -> Union[OpenAIWrapper, SiliconFlowAPI, HFChatModel]:
    """
    Args:
        model (str): Judge name (e.g., "chatgpt-0125", "gpt-4o", "qwen-72b").
        nproc (int): Ignored (popped from kwargs).
        **kwargs: Passed to the API wrapper constructor.
    Returns:
        API wrapper instance for the judge model.
    """
  • Import: from vlmeval.dataset.utils.judge_util import build_judge

I/O Contract

Direction Name Type Description
Input model str Judge shorthand name (e.g., "chatgpt-0125", "gpt-4o", "qwen-72b")
Input **kwargs dict Passed to the API wrapper constructor
Output judge OpenAIWrapper / SiliconFlowAPI / HFChatModel API wrapper instance for the judge model

Usage Examples

from vlmeval.dataset.utils.judge_util import build_judge

# Build a GPT-3.5 judge
judge = build_judge(model="chatgpt-0125")
# Use it to evaluate a response
response = judge.generate("Is the answer 'B' correct for this question? ...")

# Build a GPT-4o judge
judge_4o = build_judge(model="gpt-4o")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment