Implementation:Open compass VLMEvalKit MEGABench Evaluator

Field	Value
source	VLMEvalKit
domain	Vision, Evaluation, Multi-task, Benchmark Orchestration

Overview

Implements the MEGABenchEvaluator class that orchestrates task-level scoring for the MEGA-Bench multi-task evaluation framework.

Description

The MEGABenchEvaluator class manages the end-to-end evaluation pipeline for MEGA-Bench, loading HuggingFace datasets, model responses, and metric configurations per task. It builds a scoring_functions dictionary mapping each task name to its metric configuration (parsed from the dataset's metric_info field). The evaluator loads task data from the TIGER-Lab/MEGA-Bench HuggingFace dataset, matches model responses to evaluation contexts via query indices, and delegates to task-specific scoring functions. Results are persisted to JSON output files with temporary pickle checkpoints.

Usage

Called internally by the MEGA-Bench dataset class during multi-task evaluation.

Code Reference

Source: vlmeval/dataset/utils/megabench/evaluator.py, Lines: L1-399
Import: from vlmeval.dataset.utils.megabench.evaluator import MEGABenchEvaluator

Key Functions:

class MEGABenchEvaluator:
    def __init__(self, subset_name, responses_file, output_file): ...
    def _load_hf(self, subset_name): ...
    def _get_eval_context(self, task_name, query): ...
    def evaluate(self): ...

I/O Contract

Direction	Description
Inputs	Subset name for HuggingFace dataset loading; model responses JSON file path; output file path
Outputs	JSON file with per-task evaluation scores; temporary pickle checkpoint file for incremental evaluation

Usage Examples

# Internal usage example
from vlmeval.dataset.utils.megabench.evaluator import MEGABenchEvaluator
evaluator = MEGABenchEvaluator("core", "responses.json", "results.json")
evaluator.evaluate()

Related Pages

Principle:Open_compass_VLMEvalKit_Benchmark_Dataset_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment