Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Mlfoundations Open flamingo Evaluation Dependencies

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Evaluation, Computer_Vision
Last Updated 2026-02-08 03:30 GMT

Overview

Evaluation-specific dependencies including pycocoevalcap for CIDEr captioning metrics, NLTK for text processing, scikit-learn for ROC-AUC scoring, and scipy for mathematical operations.

Description

This environment extends the base OpenFlamingo dependencies with evaluation-specific packages. Pycocoevalcap provides COCO captioning evaluation metrics (CIDEr). Pycocotools provides the COCO API for dataset access. NLTK handles text preprocessing for VQA evaluation. Scikit-learn provides the ROC-AUC metric for Hateful Memes classification. The inflection package normalizes VQA answers for scoring.

Usage

Use this environment for the Few-Shot Evaluation workflow. It is required for running captioning evaluation (CIDEr on COCO/Flickr30K), VQA evaluation (accuracy on VQAv2/OK-VQA/VizWiz/TextVQA), and classification evaluation (top-1 accuracy on ImageNet, ROC-AUC on Hateful Memes).

System Requirements

Category Requirement Notes
Disk Varies by benchmark COCO images ~20GB, ImageNet ~150GB, others 1-10GB each
RAM 32GB+ recommended RICES feature caching loads all features into memory

Dependencies

Python Packages

  • `pycocoevalcap`
  • `pycocotools`
  • `scipy`
  • `torchvision`
  • `nltk`
  • `inflection`
  • `scikit-learn`
  • `tqdm`
  • `requests`

Development Packages

  • `black`
  • `mypy`
  • `pylint`
  • `pytest`

Credentials

No credentials are required. Benchmark datasets must be pre-downloaded to local disk.

Quick Install

# Install evaluation extras via setup.py
pip install -e ".[eval]"

# Or install manually
pip install pycocoevalcap pycocotools scipy torchvision nltk inflection scikit-learn tqdm requests

# Or from requirements file
pip install -r requirements-eval.txt

Code Evidence

Evaluation extras from `setup.py:19-27`:

EVAL = [
    "scipy",
    "torchvision",
    "nltk",
    "inflection",
    "pycocoevalcap",
    "pycocotools",
    "tqdm",
]

CIDEr metric usage from `open_flamingo/eval/coco_metric.py`:

from pycocoevalcap.eval import COCOEvalCap

ROC-AUC scoring for Hateful Memes from `open_flamingo/eval/evaluate.py:11`:

from sklearn.metrics import roc_auc_score

Common Errors

Error Message Cause Solution
`ImportError: pycocoevalcap` pycocoevalcap not installed `pip install pycocoevalcap pycocotools`
`FileNotFoundError` on dataset paths Benchmark data not downloaded Download COCO, ImageNet, etc. to paths specified in CLI args
`Only 0 shot eval is supported for non-open_flamingo models` Trying few-shot eval with BLIP Use `--shots 0` for non-OpenFlamingo models
`Number of trial seeds must be == number of trials` Mismatch between `--num_trials` and `--trial_seeds` Provide matching counts

Compatibility Notes

  • BLIP-2 support: Evaluation framework supports BLIP-2 via separate model wrapper, but only zero-shot evaluation is supported for non-OpenFlamingo models.
  • RICES features: Can be pre-cached to disk as `.pkl` files via `cache_rices_features.py` to avoid recomputation.
  • VQA test-dev: For VQAv2 and VizWiz, when no test annotations are available, results are formatted for EvalAI submission.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment