Implementation:Open compass VLMEvalKit MLVU
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Benchmarking, Long Video Understanding |
Overview
Benchmark dataset implementation for MLVU (Multi-task Long Video Understanding) evaluation in VLMEvalKit.
Description
MLVU inherits from ConcatVideoDataset and aggregates MLVU_MCQ and MLVU_OpenEnded sub-datasets. The file defines three classes: MLVU (the concatenated dataset), MLVU_MCQ (TYPE 'Video-MCQ') for multiple-choice tasks covering plotQA, needle, ego, count, anomaly recognition, topic reasoning, and order tasks, and MLVU_OpenEnded (TYPE 'Video-VQA') for open-ended tasks including sub-scene and summary evaluation. It computes M-Avg and G-Avg aggregate scores.
Usage
Registered in vlmeval/dataset/__init__.py and invoked through build_dataset() by benchmark name.
Code Reference
- Source:
vlmeval/dataset/mlvu.py, Lines: L1-458 - Import:
from vlmeval.dataset.mlvu import MLVU
Signature:
class MLVU(ConcatVideoDataset):
...
class MLVU_MCQ(VideoBaseDataset):
TYPE = 'Video-MCQ'
...
class MLVU_OpenEnded(VideoBaseDataset):
TYPE = 'Video-VQA'
...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | TSV dataset file with long video paths and MCQ/open-ended questions |
| Outputs | Evaluation results DataFrame with M-Avg and G-Avg scores |
Usage Examples
from vlmeval.dataset import build_dataset
dataset = build_dataset('MLVU')