Implementation:Open compass VLMEvalKit Build Judge
| Field | Value |
|---|---|
| Source | VLMEvalKit |
| Domain | Vision, Evaluation, NLP |
Overview
Concrete tool for constructing LLM judge model instances from shorthand names provided by VLMEvalKit.
Description
build_judge() in vlmeval/dataset/utils/judge_util.py resolves judge shorthand names to specific model versions via a hardcoded mapping (e.g., "chatgpt-0125" to "gpt-3.5-turbo-0125", "gpt-4o" to "gpt-4o-2024-05-13"). It then wraps them in the appropriate client class: OpenAIWrapper for GPT models, SiliconFlowAPI for Qwen/DeepSeek, HFChatModel for local LLMs. Calls load_env() to ensure API keys are available. Supports the LOCAL_LLM environment variable override to use a local model instead.
Usage
Called internally by dataset.evaluate() methods. Can also be called directly when building a custom evaluation pipeline.
Code Reference
- Source:
vlmeval/dataset/utils/judge_util.py, Lines: L7-40 - Signature:
def build_judge(**kwargs) -> Union[OpenAIWrapper, SiliconFlowAPI, HFChatModel]:
"""
Args:
model (str): Judge name (e.g., "chatgpt-0125", "gpt-4o", "qwen-72b").
nproc (int): Ignored (popped from kwargs).
**kwargs: Passed to the API wrapper constructor.
Returns:
API wrapper instance for the judge model.
"""
- Import:
from vlmeval.dataset.utils.judge_util import build_judge
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | model | str | Judge shorthand name (e.g., "chatgpt-0125", "gpt-4o", "qwen-72b") |
| Input | **kwargs | dict | Passed to the API wrapper constructor |
| Output | judge | OpenAIWrapper / SiliconFlowAPI / HFChatModel | API wrapper instance for the judge model |
Usage Examples
from vlmeval.dataset.utils.judge_util import build_judge
# Build a GPT-3.5 judge
judge = build_judge(model="chatgpt-0125")
# Use it to evaluate a response
response = judge.generate("Is the answer 'B' correct for this question? ...")
# Build a GPT-4o judge
judge_4o = build_judge(model="gpt-4o")