Implementation:Mlfoundations Open flamingo Evaluate captioning

Overview

Concrete tool for running few-shot captioning evaluation with CIDEr scoring on COCO and Flickr30K benchmarks provided by the OpenFlamingo evaluation module.

Description

The evaluate_captioning() function: (1) loads train and test splits of CaptionDataset, (2) selects few-shot examples (random or RICES), (3) constructs prompts with <image>Output:{caption}<|endofchunk|> format, (4) generates captions via eval_model.get_outputs() with beam search, (5) gathers predictions across distributed ranks, (6) computes CIDEr score via compute_cider() using pycocoevalcap. The companion compute_cider() function wraps pycocoevalcap's COCOEvalCap.

Usage

Called from the main evaluation loop for COCO and Flickr30K captioning benchmarks.

Code Reference

Source: Repository https://github.com/mlfoundations/open_flamingo, File: open_flamingo/eval/evaluate.py Lines L728-896 (evaluate_captioning), open_flamingo/eval/coco_metric.py Lines L1-22 (compute_cider)

Signature:

def evaluate_captioning(
    args: argparse.Namespace,
    eval_model: BaseEvalModel,
    seed: int = 42,
    min_generation_length: int = 0,
    max_generation_length: int = 20,
    num_beams: int = 3,
    length_penalty: float = 0.0,
    num_shots: int = 8,
    dataset_name: str = "coco",
    cached_features=None,
) -> float:
    """Returns CIDEr score * 100"""

def compute_cider(result_path: str, annotations_path: str) -> Dict[str, float]:
    """Returns dict with CIDEr, BLEU, METEOR, ROUGE_L, SPICE scores"""

Import:

from open_flamingo.eval.evaluate import evaluate_captioning
from open_flamingo.eval.coco_metric import compute_cider

I/O Contract

Inputs

Name	Type	Required	Description
args	Namespace	Yes	Eval config with dataset paths
eval_model	BaseEvalModel	Yes	Model wrapper
seed	int	No	Random seed (default 42)
num_shots	int	No	Few-shot examples (default 8)
dataset_name	str	No	"coco" or "flickr" (default "coco")
num_beams	int	No	Beam width (default 3)
max_generation_length	int	No	Max caption tokens (default 20)
cached_features	Tensor	No	RICES cached features

Outputs

Type	Description
float	CIDEr score multiplied by 100

Usage Examples

# Run 8-shot captioning evaluation on COCO
cider_score = evaluate_captioning(
    args=args,
    eval_model=eval_model,
    seed=42,
    num_shots=8,
    dataset_name="coco",
    num_beams=3,
    max_generation_length=20,
)
print(f"COCO CIDEr score: {cider_score:.2f}")

Environment:Mlfoundations_Open_flamingo_Evaluation_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment