Implementation:EvolvingLMMs Lab Lmms eval TextCaps Utils

Source File: `lmms_eval/tasks/textcaps/utils.py`

Principle: [[../principles/EvolvingLMMs_Lab_Lmms_eval_Task_Utility_Functions|Task_Utility_Functions]]

Overview

The TextCaps Utils module provides evaluation functions for the TextCaps benchmark, which focuses on image captioning with emphasis on recognizing and incorporating text present in images. It uses COCO captioning metrics and supports both validation and test set evaluation with submission file generation.

Key Functions

Document Processing

textcaps_doc_to_visual(doc)

Prepares image for model input

Converts document image to RGB format
Returns list containing single image

textcaps_doc_to_text(doc, lmms_eval_specific_kwargs=None)

Generates the question prompt

Extracts prompt from kwargs (task-specific prompt configuration)
Returns the configured prompt string
Allows flexible prompt engineering via YAML configuration

Results Processing

textcaps_process_result(doc, result)

Processes model prediction for validation set

Extracts prediction from result list (empty string if no result)
Creates data dictionary with:
- Ground truth captions (caption_str)
- Model prediction
- Image ID for matching
Returns dictionary mapping each metric name to the data dictionary
Enables multi-metric evaluation

textcaps_test_process_result(doc, result)

Processes model prediction for test set

Creates passthrough data structure with prediction and image ID
Returns dictionary with "textcaps_passthrough" metric
Used when ground truth is not available (test set submission)

Metrics Aggregation

textcaps_aggregation_result(results, metric, args=None)

Aggregates validation set predictions and computes metrics

Creates COCO-format dataset structure:
- "annotations" list with multiple captions per image
- "images" list with unique image IDs
Builds predictions list with image ID and caption
Initializes COCO evaluation:
- Creates COCO object from ground truth annotations
- Loads predictions using coco.loadRes()
- Initializes COCOEvalCap for metric computation
Tokenizes texts using PTBTokenizer
Computes requested metric using appropriate scorer
Handles Bleu metrics (extracts specific n-gram score from list)
Generates submission file using generate_submission_file
Saves predictions to JSON file
Returns scalar metric score

textcaps_test_aggregation_result(results, args)

Aggregates test set predictions for submission

Collects predictions with image IDs
Generates submission file named "textcaps_captions_test2014_alg_results.json"
Saves JSON file in proper format for server submission
Logs submission instructions with CodaLab URL
No metrics computed (test labels not available)

Metric-Specific Functions

textcaps_bleu1(results, args=None) through textcaps_bleu4(results, args=None)

Compute BLEU scores at different n-gram levels

BLEU-1 through BLEU-4 for n-gram precision

textcaps_meteor(results, args=None)

Computes METEOR score

Considers synonyms and word order

textcaps_rougel(results, args=None)

Computes ROUGE-L score

Longest common subsequence-based metric

textcaps_cider(results, args=None)

Computes CIDEr score

Consensus-based metric using TF-IDF

textcaps_spice(results, args=None)

Computes SPICE score

Scene graph-based semantic evaluation

Configuration

Active Metrics

TEXTCAPS_METRICS = ["Bleu_4", "Bleu_3", "Bleu_2", "Bleu_1",
                    "METEOR", "ROUGE_L", "CIDEr"]

SPICE is implemented but commented out from the default metrics list.

Submission Files

Validation: textcaps_captions_val2014_alg_results.json
Test: textcaps_captions_test2014_alg_results.json

Both files are generated in the output directory for potential server submission.

Design Characteristics

Multi-Caption Support: Handles multiple reference captions per image
Submission Generation: Automatically creates properly formatted submission files
Standard Metrics: Uses established COCO captioning evaluation framework
Test Set Handling: Separate processing path for test set without metrics
Server Integration: Provides submission instructions for CodaLab evaluation server
Comprehensive Evaluation: Supports 7 different captioning metrics

Dependencies

json - JSON file operations for submission files
os - File path operations
loguru.logger - Logging evaluation progress
pycocoevalcap.eval - COCO metric implementations (Bleu, Cider, Meteor, Rouge)
pycocoevalcap.tokenizer.ptbtokenizer.PTBTokenizer - Text tokenization
pycocotools.coco.COCO - COCO dataset handling
lmms_eval.tasks._task_utils.file_utils.generate_submission_file - Submission file path generation

Usage Context

This module supports the TextCaps benchmark, which evaluates models' ability to generate captions that incorporate text visible in images (e.g., store signs, product labels, documents). It provides both local validation metrics and test set submission file generation for official evaluation.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment