Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval Flickr30k Utils

From Leeroopedia

Location: /tmp/kapso_repo_sslb_59s/lmms_eval/tasks/flickr30k/utils.py

Principle: Task_Utility_Functions

Purpose

Task-specific utilities for Flickr30k image captioning evaluation using COCO evaluation metrics (BLEU, METEOR, ROUGE-L, CIDEr).

Key Functions

flickr_doc_to_visual

def flickr_doc_to_visual(doc)

Extracts and converts image to RGB format for model input.

flickr_doc_to_text

def flickr_doc_to_text(doc)

Returns standard captioning prompt: "Provide a one-sentence caption for the provided image."

flickr_process_result

def flickr_process_result(doc, result)

Processes single prediction, extracting image ID and creating data dict. Returns metrics dict for all 7 Flickr metrics (Bleu_1-4, METEOR, ROUGE_L, CIDEr).

flickr_aggregation_result

def flickr_aggregation_result(results, metric, args)

Core aggregation function that:

  • Constructs COCO-format dataset from results
  • Creates COCO objects and indexes
  • Tokenizes ground truth and predictions using PTBTokenizer
  • Computes specified metric score using pycocoevalcap
  • Saves submission file to disk
  • Returns metric score

Metric-Specific Aggregators

  • flickr_bleu4(results, args)
  • flickr_bleu3(results, args)
  • flickr_bleu2(results, args)
  • flickr_bleu1(results, args)
  • flickr_meteor(results, args)
  • flickr_rougel(results, args)
  • flickr_cider(results, args)
  • flickr_spice(results, args) (commented in metrics list)

Each calls flickr_aggregation_result with appropriate metric name.

flickr_test_process_result

def flickr_test_process_result(doc, result)

Passthrough processor for test set, returns prediction and image ID without scoring.

Implementation Details

  • Uses pycocoevalcap for standard image captioning metrics
  • Supports multiple reference captions per image
  • Generates submission files for server evaluation
  • FLICKR_METRICS constant defines evaluated metrics

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment