Implementation:EvolvingLMMs Lab Lmms eval TextCaps Utils
Source File: `lmms_eval/tasks/textcaps/utils.py`
Principle: [[../principles/EvolvingLMMs_Lab_Lmms_eval_Task_Utility_Functions|Task_Utility_Functions]]
Overview
The TextCaps Utils module provides evaluation functions for the TextCaps benchmark, which focuses on image captioning with emphasis on recognizing and incorporating text present in images. It uses COCO captioning metrics and supports both validation and test set evaluation with submission file generation.
Key Functions
Document Processing
textcaps_doc_to_visual(doc)- Prepares image for model input
- Converts document image to RGB format
- Returns list containing single image
textcaps_doc_to_text(doc, lmms_eval_specific_kwargs=None)- Generates the question prompt
- Extracts prompt from kwargs (task-specific prompt configuration)
- Returns the configured prompt string
- Allows flexible prompt engineering via YAML configuration
Results Processing
textcaps_process_result(doc, result)- Processes model prediction for validation set
- Extracts prediction from result list (empty string if no result)
- Creates data dictionary with:
- Ground truth captions (
caption_str) - Model prediction
- Image ID for matching
- Ground truth captions (
- Returns dictionary mapping each metric name to the data dictionary
- Enables multi-metric evaluation
textcaps_test_process_result(doc, result)- Processes model prediction for test set
- Creates passthrough data structure with prediction and image ID
- Returns dictionary with "textcaps_passthrough" metric
- Used when ground truth is not available (test set submission)
Metrics Aggregation
textcaps_aggregation_result(results, metric, args=None)- Aggregates validation set predictions and computes metrics
- Creates COCO-format dataset structure:
- "annotations" list with multiple captions per image
- "images" list with unique image IDs
- Builds predictions list with image ID and caption
- Initializes COCO evaluation:
- Creates COCO object from ground truth annotations
- Loads predictions using
coco.loadRes() - Initializes COCOEvalCap for metric computation
- Tokenizes texts using PTBTokenizer
- Computes requested metric using appropriate scorer
- Handles Bleu metrics (extracts specific n-gram score from list)
- Generates submission file using
generate_submission_file - Saves predictions to JSON file
- Returns scalar metric score
- Creates COCO-format dataset structure:
textcaps_test_aggregation_result(results, args)- Aggregates test set predictions for submission
- Collects predictions with image IDs
- Generates submission file named "textcaps_captions_test2014_alg_results.json"
- Saves JSON file in proper format for server submission
- Logs submission instructions with CodaLab URL
- No metrics computed (test labels not available)
Metric-Specific Functions
textcaps_bleu1(results, args=None)throughtextcaps_bleu4(results, args=None)- Compute BLEU scores at different n-gram levels
- BLEU-1 through BLEU-4 for n-gram precision
textcaps_meteor(results, args=None)- Computes METEOR score
- Considers synonyms and word order
textcaps_rougel(results, args=None)- Computes ROUGE-L score
- Longest common subsequence-based metric
textcaps_cider(results, args=None)- Computes CIDEr score
- Consensus-based metric using TF-IDF
textcaps_spice(results, args=None)- Computes SPICE score
- Scene graph-based semantic evaluation
Configuration
Active Metrics
TEXTCAPS_METRICS = ["Bleu_4", "Bleu_3", "Bleu_2", "Bleu_1",
"METEOR", "ROUGE_L", "CIDEr"]
SPICE is implemented but commented out from the default metrics list.
Submission Files
- Validation:
textcaps_captions_val2014_alg_results.json - Test:
textcaps_captions_test2014_alg_results.json
Both files are generated in the output directory for potential server submission.
Design Characteristics
- Multi-Caption Support: Handles multiple reference captions per image
- Submission Generation: Automatically creates properly formatted submission files
- Standard Metrics: Uses established COCO captioning evaluation framework
- Test Set Handling: Separate processing path for test set without metrics
- Server Integration: Provides submission instructions for CodaLab evaluation server
- Comprehensive Evaluation: Supports 7 different captioning metrics
Dependencies
json- JSON file operations for submission filesos- File path operationsloguru.logger- Logging evaluation progresspycocoevalcap.eval- COCO metric implementations (Bleu, Cider, Meteor, Rouge)pycocoevalcap.tokenizer.ptbtokenizer.PTBTokenizer- Text tokenizationpycocotools.coco.COCO- COCO dataset handlinglmms_eval.tasks._task_utils.file_utils.generate_submission_file- Submission file path generation
Usage Context
This module supports the TextCaps benchmark, which evaluates models' ability to generate captions that incorporate text visible in images (e.g., store signs, product labels, documents). It provides both local validation metrics and test set submission file generation for official evaluation.