Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mlfoundations Open flamingo Evaluate classification

From Leeroopedia


Template:Metadata

Overview

Concrete tool for running few-shot classification evaluation with log-probability scoring on ImageNet and Hateful Memes provided by the OpenFlamingo evaluation module.

Description

The evaluate_classification() function:

  1. Loads ImageNetDataset or HatefulMemesDataset
  2. Selects few-shot demonstrations
  3. Constructs prompts with "<image>A photo of a {class_name}.<|endofchunk|>" format for ImageNet or "<image>is an image with: '{ocr}' written on it. Is it hateful? {label}<|endofchunk|>" for Hateful Memes
  4. Calls eval_model.get_rank_classifications() to score all class names
  5. Selects top-1 prediction
  6. Gathers predictions across ranks
  7. Computes accuracy (ImageNet) or ROC-AUC (Hateful Memes)

Optional prompt ensembling averages scores over permutations of in-context examples.

Usage

Called from the main evaluation loop for classification benchmarks.

Code Reference

Source
Repository https://github.com/mlfoundations/open_flamingo, File: open_flamingo/eval/evaluate.py Lines L1118-1297
Signature

def evaluate_classification( args: argparse.Namespace, eval_model, seed: int = 42, num_shots: int = 8, dataset_name: str = "imagenet", cached_features=None, no_kv_caching: bool = False, use_prompt_ensembling: bool = False, ) -> float: """Returns top-1 accuracy for ImageNet, ROC-AUC for Hateful Memes"""

Import
from open_flamingo.eval.evaluate import evaluate_classification

I/O Contract

Inputs

Name Type Required Description
args argparse.Namespace Yes Eval config with dataset paths
eval_model BaseEvalModel Yes Model wrapper
seed int No Random seed (default 42)
num_shots int No Number of few-shot demonstrations (default 8)
dataset_name str No "imagenet" or "hateful_memes"
no_kv_caching bool No Disable KV cache optimization
use_prompt_ensembling bool No Average over demo permutations
cached_features Tensor No RICES features

Outputs

Type Description
float Top-1 accuracy for ImageNet or ROC-AUC for Hateful Memes

Usage Examples

# Evaluate on ImageNet with 8-shot demonstrations
accuracy = evaluate_classification(
    args=args,
    eval_model=eval_model,
    seed=42,
    num_shots=8,
    dataset_name="imagenet",
    no_kv_caching=False,
    use_prompt_ensembling=False,
)
print(f"ImageNet top-1 accuracy: {accuracy:.4f}")

Related Pages

Principle:Mlfoundations_Open_flamingo_Classification_Evaluation

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment