Implementation:Mlfoundations Open flamingo Evaluate classification

Overview

Concrete tool for running few-shot classification evaluation with log-probability scoring on ImageNet and Hateful Memes provided by the OpenFlamingo evaluation module.

Description

The evaluate_classification() function:

Loads ImageNetDataset or HatefulMemesDataset
Selects few-shot demonstrations
Constructs prompts with "<image>A photo of a {class_name}.<|endofchunk|>" format for ImageNet or "<image>is an image with: '{ocr}' written on it. Is it hateful? {label}<|endofchunk|>" for Hateful Memes
Calls eval_model.get_rank_classifications() to score all class names
Selects top-1 prediction
Gathers predictions across ranks
Computes accuracy (ImageNet) or ROC-AUC (Hateful Memes)

Optional prompt ensembling averages scores over permutations of in-context examples.

Usage

Called from the main evaluation loop for classification benchmarks.

Code Reference

Source: Repository https://github.com/mlfoundations/open_flamingo, File: open_flamingo/eval/evaluate.py Lines L1118-1297

Signature

def evaluate_classification( args: argparse.Namespace, eval_model, seed: int = 42, num_shots: int = 8, dataset_name: str = "imagenet", cached_features=None, no_kv_caching: bool = False, use_prompt_ensembling: bool = False, ) -> float: """Returns top-1 accuracy for ImageNet, ROC-AUC for Hateful Memes"""

Import: from open_flamingo.eval.evaluate import evaluate_classification

I/O Contract

Inputs

Name	Type	Required	Description
args	`argparse.Namespace`	Yes	Eval config with dataset paths
eval_model	`BaseEvalModel`	Yes	Model wrapper
seed	`int`	No	Random seed (default 42)
num_shots	`int`	No	Number of few-shot demonstrations (default 8)
dataset_name	`str`	No	`"imagenet"` or `"hateful_memes"`
no_kv_caching	`bool`	No	Disable KV cache optimization
use_prompt_ensembling	`bool`	No	Average over demo permutations
cached_features	`Tensor`	No	RICES features

Outputs

Type	Description
`float`	Top-1 accuracy for ImageNet or ROC-AUC for Hateful Memes

Usage Examples

# Evaluate on ImageNet with 8-shot demonstrations
accuracy = evaluate_classification(
    args=args,
    eval_model=eval_model,
    seed=42,
    num_shots=8,
    dataset_name="imagenet",
    no_kv_caching=False,
    use_prompt_ensembling=False,
)
print(f"ImageNet top-1 accuracy: {accuracy:.4f}")

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment