Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval TextCaps Utils

From Leeroopedia

Source File: `lmms_eval/tasks/textcaps/utils.py`

Principle: [[../principles/EvolvingLMMs_Lab_Lmms_eval_Task_Utility_Functions|Task_Utility_Functions]]

Overview

The TextCaps Utils module provides evaluation functions for the TextCaps benchmark, which focuses on image captioning with emphasis on recognizing and incorporating text present in images. It uses COCO captioning metrics and supports both validation and test set evaluation with submission file generation.

Key Functions

Document Processing

textcaps_doc_to_visual(doc)
Prepares image for model input
  • Converts document image to RGB format
  • Returns list containing single image
textcaps_doc_to_text(doc, lmms_eval_specific_kwargs=None)
Generates the question prompt
  • Extracts prompt from kwargs (task-specific prompt configuration)
  • Returns the configured prompt string
  • Allows flexible prompt engineering via YAML configuration

Results Processing

textcaps_process_result(doc, result)
Processes model prediction for validation set
  • Extracts prediction from result list (empty string if no result)
  • Creates data dictionary with:
    • Ground truth captions (caption_str)
    • Model prediction
    • Image ID for matching
  • Returns dictionary mapping each metric name to the data dictionary
  • Enables multi-metric evaluation
textcaps_test_process_result(doc, result)
Processes model prediction for test set
  • Creates passthrough data structure with prediction and image ID
  • Returns dictionary with "textcaps_passthrough" metric
  • Used when ground truth is not available (test set submission)

Metrics Aggregation

textcaps_aggregation_result(results, metric, args=None)
Aggregates validation set predictions and computes metrics
  • Creates COCO-format dataset structure:
    • "annotations" list with multiple captions per image
    • "images" list with unique image IDs
  • Builds predictions list with image ID and caption
  • Initializes COCO evaluation:
    • Creates COCO object from ground truth annotations
    • Loads predictions using coco.loadRes()
    • Initializes COCOEvalCap for metric computation
  • Tokenizes texts using PTBTokenizer
  • Computes requested metric using appropriate scorer
  • Handles Bleu metrics (extracts specific n-gram score from list)
  • Generates submission file using generate_submission_file
  • Saves predictions to JSON file
  • Returns scalar metric score
textcaps_test_aggregation_result(results, args)
Aggregates test set predictions for submission
  • Collects predictions with image IDs
  • Generates submission file named "textcaps_captions_test2014_alg_results.json"
  • Saves JSON file in proper format for server submission
  • Logs submission instructions with CodaLab URL
  • No metrics computed (test labels not available)

Metric-Specific Functions

textcaps_bleu1(results, args=None) through textcaps_bleu4(results, args=None)
Compute BLEU scores at different n-gram levels
  • BLEU-1 through BLEU-4 for n-gram precision
textcaps_meteor(results, args=None)
Computes METEOR score
  • Considers synonyms and word order
textcaps_rougel(results, args=None)
Computes ROUGE-L score
  • Longest common subsequence-based metric
textcaps_cider(results, args=None)
Computes CIDEr score
  • Consensus-based metric using TF-IDF
textcaps_spice(results, args=None)
Computes SPICE score
  • Scene graph-based semantic evaluation

Configuration

Active Metrics

TEXTCAPS_METRICS = ["Bleu_4", "Bleu_3", "Bleu_2", "Bleu_1",
                    "METEOR", "ROUGE_L", "CIDEr"]

SPICE is implemented but commented out from the default metrics list.

Submission Files

  • Validation: textcaps_captions_val2014_alg_results.json
  • Test: textcaps_captions_test2014_alg_results.json

Both files are generated in the output directory for potential server submission.

Design Characteristics

  • Multi-Caption Support: Handles multiple reference captions per image
  • Submission Generation: Automatically creates properly formatted submission files
  • Standard Metrics: Uses established COCO captioning evaluation framework
  • Test Set Handling: Separate processing path for test set without metrics
  • Server Integration: Provides submission instructions for CodaLab evaluation server
  • Comprehensive Evaluation: Supports 7 different captioning metrics

Dependencies

  • json - JSON file operations for submission files
  • os - File path operations
  • loguru.logger - Logging evaluation progress
  • pycocoevalcap.eval - COCO metric implementations (Bleu, Cider, Meteor, Rouge)
  • pycocoevalcap.tokenizer.ptbtokenizer.PTBTokenizer - Text tokenization
  • pycocotools.coco.COCO - COCO dataset handling
  • lmms_eval.tasks._task_utils.file_utils.generate_submission_file - Submission file path generation

Usage Context

This module supports the TextCaps benchmark, which evaluates models' ability to generate captions that incorporate text visible in images (e.g., store signs, product labels, documents). It provides both local validation metrics and test set submission file generation for official evaluation.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment