Implementation:Open compass VLMEvalKit Build Judge

Field	Value
Source	VLMEvalKit
Domain	Vision, Evaluation, NLP

Overview

Concrete tool for constructing LLM judge model instances from shorthand names provided by VLMEvalKit.

Description

build_judge() in vlmeval/dataset/utils/judge_util.py resolves judge shorthand names to specific model versions via a hardcoded mapping (e.g., "chatgpt-0125" to "gpt-3.5-turbo-0125", "gpt-4o" to "gpt-4o-2024-05-13"). It then wraps them in the appropriate client class: OpenAIWrapper for GPT models, SiliconFlowAPI for Qwen/DeepSeek, HFChatModel for local LLMs. Calls load_env() to ensure API keys are available. Supports the LOCAL_LLM environment variable override to use a local model instead.

Usage

Called internally by dataset.evaluate() methods. Can also be called directly when building a custom evaluation pipeline.

Code Reference

Source: vlmeval/dataset/utils/judge_util.py, Lines: L7-40
Signature:

def build_judge(**kwargs) -> Union[OpenAIWrapper, SiliconFlowAPI, HFChatModel]:
    """
    Args:
        model (str): Judge name (e.g., "chatgpt-0125", "gpt-4o", "qwen-72b").
        nproc (int): Ignored (popped from kwargs).
        **kwargs: Passed to the API wrapper constructor.
    Returns:
        API wrapper instance for the judge model.
    """

Import: from vlmeval.dataset.utils.judge_util import build_judge

I/O Contract

Direction	Name	Type	Description
Input	model	str	Judge shorthand name (e.g., "chatgpt-0125", "gpt-4o", "qwen-72b")
Input	**kwargs	dict	Passed to the API wrapper constructor
Output	judge	OpenAIWrapper / SiliconFlowAPI / HFChatModel	API wrapper instance for the judge model

Usage Examples

from vlmeval.dataset.utils.judge_util import build_judge

# Build a GPT-3.5 judge
judge = build_judge(model="chatgpt-0125")
# Use it to evaluate a response
response = judge.generate("Is the answer 'B' correct for this question? ...")

# Build a GPT-4o judge
judge_4o = build_judge(model="gpt-4o")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment