Implementation:EvolvingLMMs Lab Lmms eval Flickr30k Utils
Location: /tmp/kapso_repo_sslb_59s/lmms_eval/tasks/flickr30k/utils.py
Principle: Task_Utility_Functions
Purpose
Task-specific utilities for Flickr30k image captioning evaluation using COCO evaluation metrics (BLEU, METEOR, ROUGE-L, CIDEr).
Key Functions
flickr_doc_to_visual
def flickr_doc_to_visual(doc)
Extracts and converts image to RGB format for model input.
flickr_doc_to_text
def flickr_doc_to_text(doc)
Returns standard captioning prompt: "Provide a one-sentence caption for the provided image."
flickr_process_result
def flickr_process_result(doc, result)
Processes single prediction, extracting image ID and creating data dict. Returns metrics dict for all 7 Flickr metrics (Bleu_1-4, METEOR, ROUGE_L, CIDEr).
flickr_aggregation_result
def flickr_aggregation_result(results, metric, args)
Core aggregation function that:
- Constructs COCO-format dataset from results
- Creates COCO objects and indexes
- Tokenizes ground truth and predictions using PTBTokenizer
- Computes specified metric score using pycocoevalcap
- Saves submission file to disk
- Returns metric score
Metric-Specific Aggregators
flickr_bleu4(results, args)flickr_bleu3(results, args)flickr_bleu2(results, args)flickr_bleu1(results, args)flickr_meteor(results, args)flickr_rougel(results, args)flickr_cider(results, args)flickr_spice(results, args)(commented in metrics list)
Each calls flickr_aggregation_result with appropriate metric name.
flickr_test_process_result
def flickr_test_process_result(doc, result)
Passthrough processor for test set, returns prediction and image ID without scoring.
Implementation Details
- Uses pycocoevalcap for standard image captioning metrics
- Supports multiple reference captions per image
- Generates submission files for server evaluation
- FLICKR_METRICS constant defines evaluated metrics