Implementation:FlagOpen FlagEmbedding MLVU Open Bench

Knowledge Sources	FlagOpen_FlagEmbedding
Domains	Video Understanding, Benchmark, Machine Learning
Last Updated	2026-02-09 00:00 GMT

Overview

Dataset loader and evaluation framework for open-ended generation tasks in the MLVU video understanding benchmark.

Description

This module provides the MLVU dataset class and evaluation infrastructure for open-ended video question answering tasks, specifically for sub-plot analysis and video summarization. It implements a PyTorch Dataset that loads video paths, questions, and ground truth answers from JSON files, preparing data for multimodal language model inference.

The dataset handles multiple task types (subPlot and summary) and provides utilities for formatting prompts with conversation templates. It includes frame index calculation for temporal video sampling and statistical reporting of dataset composition. The main function demonstrates the inference workflow template for running models and collecting predictions.

Usage

Use this class to load MLVU benchmark data for evaluating multimodal language models on long video understanding tasks with open-ended generation. Integrate with your model's inference pipeline to generate predictions that can be evaluated using the companion evaluation scripts.

Code Reference

Source Location

Repository: FlagOpen_FlagEmbedding
File: research/MLVU/evaluation/generation_evaluation/open_bench.py
Lines: 1-167

Signature

class MLVU(Dataset):
    def __init__(self, data_dir, data_list):
        """Initialize MLVU dataset"""

    def __getitem__(self, idx):
        """Get a single data sample"""

    def qa_template(self, data):
        """Format question and answer"""

def get_prompt2(conv):
    """Generate prompt from conversation template"""

Import

import torch
from torch.utils.data import Dataset
import json
from tqdm import tqdm
import numpy as np

I/O Contract

Inputs

Name	Type	Required	Description
data_dir	str	Yes	Directory containing JSON annotation files
data_list	dict	Yes	Dictionary mapping task types to (json_file, video_prefix, data_type, flag)

Outputs

Name	Type	Description
video	str	Path to video file
question	str	Question text
answer	str	Ground truth answer
task_type	str	Type of task (subPlot or summary)

Usage Examples

# Dataset initialization
data_list = {
    "subPlot": ("8_sub_scene.json", "MLVU_all/video/subPlot", "video", False),
    "summary": ("9_summary.json", "MLVU_all/video/summary", "video", False)
}

data_dir = "MLVU_all/json"
dataset = MLVU(data_dir, data_list)

# Print dataset statistics
print(dataset)

# Iterate through dataset
for example in dataset:
    video_path = example["video"]
    question = example["question"]
    answer = example["answer"]
    task_type = example["task_type"]

    # Run your model inference here
    # pred = model.generate(video_path, question)

    # Collect results
    result = {
        "video_name": video_path.split("/")[-1],
        "Q": question,
        "A": answer,
        "pred": pred
    }

Related Pages

Principle:FlagOpen_FlagEmbedding_Long_Video_Understanding_Evaluation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment