Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit MLVU

From Leeroopedia
Field Value
source VLMEvalKit
domain Vision, Benchmarking, Long Video Understanding

Overview

Benchmark dataset implementation for MLVU (Multi-task Long Video Understanding) evaluation in VLMEvalKit.

Description

MLVU inherits from ConcatVideoDataset and aggregates MLVU_MCQ and MLVU_OpenEnded sub-datasets. The file defines three classes: MLVU (the concatenated dataset), MLVU_MCQ (TYPE 'Video-MCQ') for multiple-choice tasks covering plotQA, needle, ego, count, anomaly recognition, topic reasoning, and order tasks, and MLVU_OpenEnded (TYPE 'Video-VQA') for open-ended tasks including sub-scene and summary evaluation. It computes M-Avg and G-Avg aggregate scores.

Usage

Registered in vlmeval/dataset/__init__.py and invoked through build_dataset() by benchmark name.

Code Reference

  • Source: vlmeval/dataset/mlvu.py, Lines: L1-458
  • Import: from vlmeval.dataset.mlvu import MLVU

Signature:

class MLVU(ConcatVideoDataset):
    ...

class MLVU_MCQ(VideoBaseDataset):
    TYPE = 'Video-MCQ'
    ...

class MLVU_OpenEnded(VideoBaseDataset):
    TYPE = 'Video-VQA'
    ...

I/O Contract

Direction Description
Inputs TSV dataset file with long video paths and MCQ/open-ended questions
Outputs Evaluation results DataFrame with M-Avg and G-Avg scores

Usage Examples

from vlmeval.dataset import build_dataset
dataset = build_dataset('MLVU')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment