Implementation:Open compass VLMEvalKit MEGABench Evaluator
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Evaluation, Multi-task, Benchmark Orchestration |
Overview
Implements the MEGABenchEvaluator class that orchestrates task-level scoring for the MEGA-Bench multi-task evaluation framework.
Description
The MEGABenchEvaluator class manages the end-to-end evaluation pipeline for MEGA-Bench, loading HuggingFace datasets, model responses, and metric configurations per task. It builds a scoring_functions dictionary mapping each task name to its metric configuration (parsed from the dataset's metric_info field). The evaluator loads task data from the TIGER-Lab/MEGA-Bench HuggingFace dataset, matches model responses to evaluation contexts via query indices, and delegates to task-specific scoring functions. Results are persisted to JSON output files with temporary pickle checkpoints.
Usage
Called internally by the MEGA-Bench dataset class during multi-task evaluation.
Code Reference
- Source:
vlmeval/dataset/utils/megabench/evaluator.py, Lines: L1-399 - Import:
from vlmeval.dataset.utils.megabench.evaluator import MEGABenchEvaluator
Key Functions:
class MEGABenchEvaluator:
def __init__(self, subset_name, responses_file, output_file): ...
def _load_hf(self, subset_name): ...
def _get_eval_context(self, task_name, query): ...
def evaluate(self): ...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | Subset name for HuggingFace dataset loading; model responses JSON file path; output file path |
| Outputs | JSON file with per-task evaluation scores; temporary pickle checkpoint file for incremental evaluation |
Usage Examples
# Internal usage example
from vlmeval.dataset.utils.megabench.evaluator import MEGABenchEvaluator
evaluator = MEGABenchEvaluator("core", "responses.json", "results.json")
evaluator.evaluate()