Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit MEGABench Evaluator

From Leeroopedia
Revision as of 13:30, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Open_compass_VLMEvalKit_MEGABench_Evaluator.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Field Value
source VLMEvalKit
domain Vision, Evaluation, Multi-task, Benchmark Orchestration

Overview

Implements the MEGABenchEvaluator class that orchestrates task-level scoring for the MEGA-Bench multi-task evaluation framework.

Description

The MEGABenchEvaluator class manages the end-to-end evaluation pipeline for MEGA-Bench, loading HuggingFace datasets, model responses, and metric configurations per task. It builds a scoring_functions dictionary mapping each task name to its metric configuration (parsed from the dataset's metric_info field). The evaluator loads task data from the TIGER-Lab/MEGA-Bench HuggingFace dataset, matches model responses to evaluation contexts via query indices, and delegates to task-specific scoring functions. Results are persisted to JSON output files with temporary pickle checkpoints.

Usage

Called internally by the MEGA-Bench dataset class during multi-task evaluation.

Code Reference

  • Source: vlmeval/dataset/utils/megabench/evaluator.py, Lines: L1-399
  • Import: from vlmeval.dataset.utils.megabench.evaluator import MEGABenchEvaluator

Key Functions:

class MEGABenchEvaluator:
    def __init__(self, subset_name, responses_file, output_file): ...
    def _load_hf(self, subset_name): ...
    def _get_eval_context(self, task_name, query): ...
    def evaluate(self): ...

I/O Contract

Direction Description
Inputs Subset name for HuggingFace dataset loading; model responses JSON file path; output file path
Outputs JSON file with per-task evaluation scores; temporary pickle checkpoint file for incremental evaluation

Usage Examples

# Internal usage example
from vlmeval.dataset.utils.megabench.evaluator import MEGABenchEvaluator
evaluator = MEGABenchEvaluator("core", "responses.json", "results.json")
evaluator.evaluate()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment