Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:FlagOpen FlagEmbedding MLVU Open Bench

From Leeroopedia


Knowledge Sources
Domains Video Understanding, Benchmark, Machine Learning
Last Updated 2026-02-09 00:00 GMT

Overview

Dataset loader and evaluation framework for open-ended generation tasks in the MLVU video understanding benchmark.

Description

This module provides the MLVU dataset class and evaluation infrastructure for open-ended video question answering tasks, specifically for sub-plot analysis and video summarization. It implements a PyTorch Dataset that loads video paths, questions, and ground truth answers from JSON files, preparing data for multimodal language model inference.

The dataset handles multiple task types (subPlot and summary) and provides utilities for formatting prompts with conversation templates. It includes frame index calculation for temporal video sampling and statistical reporting of dataset composition. The main function demonstrates the inference workflow template for running models and collecting predictions.

Usage

Use this class to load MLVU benchmark data for evaluating multimodal language models on long video understanding tasks with open-ended generation. Integrate with your model's inference pipeline to generate predictions that can be evaluated using the companion evaluation scripts.

Code Reference

Source Location

Signature

class MLVU(Dataset):
    def __init__(self, data_dir, data_list):
        """Initialize MLVU dataset"""

    def __getitem__(self, idx):
        """Get a single data sample"""

    def qa_template(self, data):
        """Format question and answer"""

def get_prompt2(conv):
    """Generate prompt from conversation template"""

Import

import torch
from torch.utils.data import Dataset
import json
from tqdm import tqdm
import numpy as np

I/O Contract

Inputs

Name Type Required Description
data_dir str Yes Directory containing JSON annotation files
data_list dict Yes Dictionary mapping task types to (json_file, video_prefix, data_type, flag)

Outputs

Name Type Description
video str Path to video file
question str Question text
answer str Ground truth answer
task_type str Type of task (subPlot or summary)

Usage Examples

# Dataset initialization
data_list = {
    "subPlot": ("8_sub_scene.json", "MLVU_all/video/subPlot", "video", False),
    "summary": ("9_summary.json", "MLVU_all/video/summary", "video", False)
}

data_dir = "MLVU_all/json"
dataset = MLVU(data_dir, data_list)

# Print dataset statistics
print(dataset)

# Iterate through dataset
for example in dataset:
    video_path = example["video"]
    question = example["question"]
    answer = example["answer"]
    task_type = example["task_type"]

    # Run your model inference here
    # pred = model.generate(video_path, question)

    # Collect results
    result = {
        "video_name": video_path.split("/")[-1],
        "Q": question,
        "A": answer,
        "pred": pred
    }

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment