Implementation:Open compass VLMEvalKit MLVU

Field	Value
source	VLMEvalKit
domain	Vision, Benchmarking, Long Video Understanding

Overview

Benchmark dataset implementation for MLVU (Multi-task Long Video Understanding) evaluation in VLMEvalKit.

Description

MLVU inherits from ConcatVideoDataset and aggregates MLVU_MCQ and MLVU_OpenEnded sub-datasets. The file defines three classes: MLVU (the concatenated dataset), MLVU_MCQ (TYPE 'Video-MCQ') for multiple-choice tasks covering plotQA, needle, ego, count, anomaly recognition, topic reasoning, and order tasks, and MLVU_OpenEnded (TYPE 'Video-VQA') for open-ended tasks including sub-scene and summary evaluation. It computes M-Avg and G-Avg aggregate scores.

Usage

Registered in vlmeval/dataset/__init__.py and invoked through build_dataset() by benchmark name.

Code Reference

Source: vlmeval/dataset/mlvu.py, Lines: L1-458
Import: from vlmeval.dataset.mlvu import MLVU

Signature:

class MLVU(ConcatVideoDataset):
    ...

class MLVU_MCQ(VideoBaseDataset):
    TYPE = 'Video-MCQ'
    ...

class MLVU_OpenEnded(VideoBaseDataset):
    TYPE = 'Video-VQA'
    ...

I/O Contract

Direction	Description
Inputs	TSV dataset file with long video paths and MCQ/open-ended questions
Outputs	Evaluation results DataFrame with M-Avg and G-Avg scores

Usage Examples

from vlmeval.dataset import build_dataset
dataset = build_dataset('MLVU')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment