Environment:Mlfoundations Open flamingo Evaluation Dependencies

Knowledge Sources	OpenFlamingo pycocoevalcap
Domains	Infrastructure, Evaluation, Computer_Vision
Last Updated	2026-02-08 03:30 GMT

Overview

Evaluation-specific dependencies including pycocoevalcap for CIDEr captioning metrics, NLTK for text processing, scikit-learn for ROC-AUC scoring, and scipy for mathematical operations.

Description

This environment extends the base OpenFlamingo dependencies with evaluation-specific packages. Pycocoevalcap provides COCO captioning evaluation metrics (CIDEr). Pycocotools provides the COCO API for dataset access. NLTK handles text preprocessing for VQA evaluation. Scikit-learn provides the ROC-AUC metric for Hateful Memes classification. The inflection package normalizes VQA answers for scoring.

Usage

Use this environment for the Few-Shot Evaluation workflow. It is required for running captioning evaluation (CIDEr on COCO/Flickr30K), VQA evaluation (accuracy on VQAv2/OK-VQA/VizWiz/TextVQA), and classification evaluation (top-1 accuracy on ImageNet, ROC-AUC on Hateful Memes).

System Requirements

Category	Requirement	Notes
Disk	Varies by benchmark	COCO images ~20GB, ImageNet ~150GB, others 1-10GB each
RAM	32GB+ recommended	RICES feature caching loads all features into memory

Dependencies

Python Packages

`pycocoevalcap`
`pycocotools`
`scipy`
`torchvision`
`nltk`
`inflection`
`scikit-learn`
`tqdm`
`requests`

Development Packages

`black`
`mypy`
`pylint`
`pytest`

Credentials

No credentials are required. Benchmark datasets must be pre-downloaded to local disk.

Quick Install

# Install evaluation extras via setup.py
pip install -e ".[eval]"

# Or install manually
pip install pycocoevalcap pycocotools scipy torchvision nltk inflection scikit-learn tqdm requests

# Or from requirements file
pip install -r requirements-eval.txt

Code Evidence

Evaluation extras from `setup.py:19-27`:

EVAL = [
    "scipy",
    "torchvision",
    "nltk",
    "inflection",
    "pycocoevalcap",
    "pycocotools",
    "tqdm",
]

CIDEr metric usage from `open_flamingo/eval/coco_metric.py`:

from pycocoevalcap.eval import COCOEvalCap

ROC-AUC scoring for Hateful Memes from `open_flamingo/eval/evaluate.py:11`:

from sklearn.metrics import roc_auc_score

Common Errors

Error Message	Cause	Solution
`ImportError: pycocoevalcap`	pycocoevalcap not installed	`pip install pycocoevalcap pycocotools`
`FileNotFoundError` on dataset paths	Benchmark data not downloaded	Download COCO, ImageNet, etc. to paths specified in CLI args
`Only 0 shot eval is supported for non-open_flamingo models`	Trying few-shot eval with BLIP	Use `--shots 0` for non-OpenFlamingo models
`Number of trial seeds must be == number of trials`	Mismatch between `--num_trials` and `--trial_seeds`	Provide matching counts

Compatibility Notes

BLIP-2 support: Evaluation framework supports BLIP-2 via separate model wrapper, but only zero-shot evaluation is supported for non-OpenFlamingo models.
RICES features: Can be pre-cached to disk as `.pkl` files via `cache_rices_features.py` to avoid recomputation.
VQA test-dev: For VQAv2 and VizWiz, when no test annotations are available, results are formatted for EvalAI submission.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment