Implementation:Open compass VLMEvalKit WorldSense Utils
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Evaluation, Video Understanding, Multi-domain |
Overview
Provides multi-dimensional evaluation utilities for the WorldSense video understanding benchmark with domain and duration categorization.
Description
This module defines comprehensive categorization structures for WorldSense evaluation: `DURATIONS` with six time buckets (<1min through >8min), `DOMAINS` covering eight content areas (Tech & Science, Culture & Politics, Daily Life, Film & TV, Performance, Games, Sports, Music), and `SUB_CATEGORIES` with 60+ fine-grained topic classifications from Academic Lectures to Painting & Photography. It uses `extract_answer_from_item` for multiple-choice answer extraction.
Usage
Called internally by the corresponding dataset class during evaluation.
Code Reference
- Source:
vlmeval/dataset/utils/worldsense.py, Lines: L1-238 - Import:
from vlmeval.dataset.utils.worldsense import DURATIONS, DOMAINS, SUB_CATEGORIES
Key Functions:
DURATIONS = ['<1min', '1-2min', '2-4min', '4-6min', '6-8min', '>8min']
DOMAINS = ['Tech & Science', 'Culture & Politics', ...]
SUB_CATEGORIES = ['Academic Lectures', 'Auto', ...]
I/O Contract
| Direction | Description |
|---|---|
| Inputs | Model predictions and ground truth with duration, domain, and sub-category metadata |
| Outputs | Per-dimension accuracy breakdowns across time buckets, domains, and sub-categories |
Usage Examples
from vlmeval.dataset.utils.worldsense import DOMAINS, DURATIONS
print(DURATIONS)