Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Open compass VLMEvalKit Build Judge

From Leeroopedia
Field Value
Source VLMEvalKit
Domain Vision, Evaluation, NLP

Overview

Concrete tool for constructing LLM judge model instances from shorthand names provided by VLMEvalKit.

Description

build_judge() in vlmeval/dataset/utils/judge_util.py resolves judge shorthand names to specific model versions via a hardcoded mapping (e.g., "chatgpt-0125" to "gpt-3.5-turbo-0125", "gpt-4o" to "gpt-4o-2024-05-13"). It then wraps them in the appropriate client class: OpenAIWrapper for GPT models, SiliconFlowAPI for Qwen/DeepSeek, HFChatModel for local LLMs. Calls load_env() to ensure API keys are available. Supports the LOCAL_LLM environment variable override to use a local model instead.

Usage

Called internally by dataset.evaluate() methods. Can also be called directly when building a custom evaluation pipeline.

Code Reference

  • Source: vlmeval/dataset/utils/judge_util.py, Lines: L7-40
  • Signature:
def build_judge(**kwargs) -> Union[OpenAIWrapper, SiliconFlowAPI, HFChatModel]:
    """
    Args:
        model (str): Judge name (e.g., "chatgpt-0125", "gpt-4o", "qwen-72b").
        nproc (int): Ignored (popped from kwargs).
        **kwargs: Passed to the API wrapper constructor.
    Returns:
        API wrapper instance for the judge model.
    """
  • Import: from vlmeval.dataset.utils.judge_util import build_judge

I/O Contract

Direction Name Type Description
Input model str Judge shorthand name (e.g., "chatgpt-0125", "gpt-4o", "qwen-72b")
Input **kwargs dict Passed to the API wrapper constructor
Output judge OpenAIWrapper / SiliconFlowAPI / HFChatModel API wrapper instance for the judge model

Usage Examples

from vlmeval.dataset.utils.judge_util import build_judge

# Build a GPT-3.5 judge
judge = build_judge(model="chatgpt-0125")
# Use it to evaluate a response
response = judge.generate("Is the answer 'B' correct for this question? ...")

# Build a GPT-4o judge
judge_4o = build_judge(model="gpt-4o")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment