Implementation:FlagOpen FlagEmbedding MLVU PlotQA Data
| Knowledge Sources | |
|---|---|
| Domains | Video Understanding, Benchmark Data, Question Answering |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Benchmark dataset for plot-based question answering on long videos in the MLVU evaluation framework.
Description
The MLVU PlotQA dataset is part of the MLVU (Multi-modal Long Video Understanding) benchmark, containing multiple-choice questions that test understanding of video plots and narratives. Each entry includes a video reference, duration information, a question about the video content, four candidate answers, and the correct answer. Questions focus on plot elements such as character appearances, actions, events, and visual details that require comprehensive video understanding.
The dataset format is structured JSON with 7009 lines, where each question requires watching and comprehending long-form video content to answer correctly. This tests models' ability to track narrative elements, identify key visual details, and reason about plot progression over extended video sequences.
Usage
Use this dataset for evaluating video understanding models on plot comprehension tasks, benchmarking multi-modal models on long-form video question answering, or training systems for narrative understanding in videos.
Code Reference
Source Location
- Repository: FlagOpen_FlagEmbedding
- File: research/MLVU/data/1_plotQA.json
Data Structure
{
"video": str, # Video filename (e.g., "movie101_66.mp4")
"duration": int, # Video duration in seconds
"question": str, # Question about video plot
"candidates": List[str], # Four candidate answers
"answer": str, # Correct answer (one of candidates)
"question_type": str # Always "plotQA" for this dataset
}
Import
import json
# Load the dataset
with open("research/MLVU/data/1_plotQA.json", "r") as f:
data = [json.loads(line) for line in f]
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| file_path | str | Yes | Path to the JSON data file |
Outputs
| Field | Type | Description |
|---|---|---|
| video | str | Video filename identifier |
| duration | int | Length of video in seconds |
| question | str | Question text about the video plot |
| candidates | List[str] | List of 4 possible answers |
| answer | str | The correct answer string |
| question_type | str | Type identifier ("plotQA") |
Usage Examples
import json
from typing import List, Dict
# Load PlotQA dataset
def load_plotqa_data(file_path: str) -> List[Dict]:
with open(file_path, "r") as f:
return [json.loads(line) for line in f]
data = load_plotqa_data("research/MLVU/data/1_plotQA.json")
# Example entry
example = data[0]
print(f"Video: {example['video']}")
print(f"Duration: {example['duration']}s")
print(f"Question: {example['question']}")
print(f"Candidates: {example['candidates']}")
print(f"Answer: {example['answer']}")
# Output:
# Video: movie101_66.mp4
# Duration: 246s
# Question: What color is the main male character in the video?
# Candidates: ['Yellow', 'Red', 'Green', 'Blue']
# Answer: Yellow
# Evaluate a model
def evaluate_plotqa(model, data: List[Dict]) -> float:
correct = 0
total = len(data)
for item in data:
video_path = f"videos/{item['video']}"
question = item['question']
candidates = item['candidates']
correct_answer = item['answer']
# Model prediction (pseudo-code)
predicted_answer = model.predict(video_path, question, candidates)
if predicted_answer == correct_answer:
correct += 1
accuracy = correct / total
return accuracy
# Filter by video duration
short_videos = [item for item in data if item['duration'] < 300] # < 5 minutes
long_videos = [item for item in data if item['duration'] >= 600] # >= 10 minutes
print(f"Short videos: {len(short_videos)}")
print(f"Long videos: {len(long_videos)}")
# Analyze question types
questions_about_color = [
item for item in data
if "color" in item['question'].lower()
]
print(f"Color questions: {len(questions_about_color)}")