Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval VATEX Utils

From Leeroopedia

Source File: `lmms_eval/tasks/vatex/utils.py`

Principle: [[../principles/EvolvingLMMs_Lab_Lmms_eval_Task_Utility_Functions|Task_Utility_Functions]]

Overview

The VATEX Utils module provides evaluation functions for the VATEX (Video And Text EXtraction) benchmark, a multilingual video captioning dataset. It supports both English and Chinese caption generation with few-shot prompting and handles validation and test set evaluation with submission file generation.

Key Functions

Document Processing

vatex_ZH_doc_to_visual(doc)
Prepares video path for Chinese validation set
  • Reads YAML configuration to get cache directory
  • Constructs video path from video ID and cache location
  • Checks multiple file extensions: .mp4, .MP4, .mkv
  • Exits with error if video not found
  • Returns list containing video file path
vatex_test_doc_to_visual(doc)
Prepares video path for test set
  • Similar logic to Chinese validation function
  • Reads test set YAML configuration
  • Handles multiple video formats
  • Returns list containing video file path

Prompt Generation

vatex_ZH_doc_to_text(doc, lmms_eval_specific_kwargs=None)
Generates prompt for Chinese caption generation
  • Includes 4-shot examples in Chinese:
    • Video 1: Mountain climbing scene
    • Video 2: Simulated drumming
    • Video 3: Hand gestures at desk
    • Video 4: Applying face cream
  • Appends configured prompt from kwargs
  • Returns formatted prompt with examples
vatex_test_doc_to_text(doc, lmms_eval_specific_kwargs=None)
Generates prompt for English test set
  • Includes 4-shot examples in English:
    • Video 1: Shoe care items
    • Video 2: Cooking with frying pan
    • Video 3: Cross stitch demonstration
    • Video 4: Girl doing flips
  • Appends configured prompt from kwargs
  • Returns formatted prompt with examples

Results Processing

vatex_process_result(doc, result)
Processes English caption predictions
  • Extracts prediction from result list
  • Creates data dictionary with:
    • English ground truth captions (enCap)
    • Model prediction
    • Video ID
  • Returns dictionary mapping each metric to the data dictionary
vatex_process_CN_result(doc, result)
Processes Chinese caption predictions
  • Similar to English processing
  • Uses Chinese ground truth captions (chCap)
  • Returns metric-mapped data dictionary
vatex_test_process_result(doc, result)
Processes test set predictions
  • Creates passthrough structure (no metrics computed)
  • Returns dictionary with image ID and prediction

Metrics Aggregation

vatex_aggregation_result(results, metric, args=None)
Aggregates predictions and computes validation metrics
  • Creates COCO-format dataset structure:
    • Uses video IDs as image IDs
    • Multiple reference captions per video
  • Initializes COCO evaluation pipeline
  • Tokenizes using PTBTokenizer
  • Computes requested metric
  • Handles Bleu score list extraction
  • Generates submission file "vatex_captions_val_results.json"
  • Saves predictions to JSON
  • Returns scalar metric score
vatex_test_aggregation_result(results, args)
Aggregates test set predictions for submission
  • Collects predictions with image IDs
  • Generates submission file "vatex_captions_test2014_alg_results.json"
  • Provides submission instructions
  • No metrics computed

Metric-Specific Functions

vatex_bleu1(results, args=None) through vatex_bleu4(results, args=None)
Compute BLEU scores at n-gram levels 1-4
vatex_meteor(results, args=None)
Computes METEOR score
vatex_rougel(results, args=None)
Computes ROUGE-L score
vatex_cider(results, args=None)
Computes CIDEr score
vatex_spice(results, args=None)
Computes SPICE score

Configuration

Active Metrics

VATEX_METRICS = ["Bleu_4", "Bleu_3", "Bleu_2", "Bleu_1",
                 "METEOR", "ROUGE_L", "CIDEr"]

Cache Directory

Base cache directory is determined from:

  • Environment variable HF_HOME
  • Default: ~/.cache/huggingface/
  • Task-specific cache path read from YAML configuration

Few-Shot Examples

Both English and Chinese prompts include 4 example videos with reference captions to demonstrate the task format.

Design Characteristics

  • Multilingual Support: Separate processing for English and Chinese captions
  • Few-Shot Learning: Includes example videos in prompts
  • Video Format Handling: Checks multiple video file extensions
  • Configuration-Driven: Reads cache paths from YAML files
  • Multi-Reference Evaluation: Supports multiple ground truth captions per video
  • Submission Generation: Creates files for server evaluation
  • COCO Evaluation Framework: Uses standard captioning metrics

Dependencies

  • json - JSON operations for submission files
  • os - File system operations
  • sys - System exit on errors
  • pathlib.Path - Path manipulation
  • yaml - YAML configuration parsing
  • loguru.logger - Logging
  • pycocoevalcap.eval - Captioning metrics (Bleu, Cider, Meteor, Rouge)
  • pycocoevalcap.tokenizer.ptbtokenizer.PTBTokenizer - Tokenization
  • pycocotools.coco.COCO - COCO dataset handling
  • lmms_eval.tasks._task_utils.file_utils.generate_submission_file - File generation

Usage Context

This module supports the VATEX video captioning benchmark, which tests models' ability to describe video content in natural language. It handles both English and Chinese evaluation, uses few-shot prompting to guide models, and generates submission files for official benchmark evaluation.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment