Implementation:Mlfoundations Open flamingo Evaluate classification
Overview
Concrete tool for running few-shot classification evaluation with log-probability scoring on ImageNet and Hateful Memes provided by the OpenFlamingo evaluation module.
Description
The evaluate_classification() function:
- Loads ImageNetDataset or HatefulMemesDataset
- Selects few-shot demonstrations
- Constructs prompts with
"<image>A photo of a {class_name}.<|endofchunk|>"format for ImageNet or"<image>is an image with: '{ocr}' written on it. Is it hateful? {label}<|endofchunk|>"for Hateful Memes - Calls
eval_model.get_rank_classifications()to score all class names - Selects top-1 prediction
- Gathers predictions across ranks
- Computes accuracy (ImageNet) or ROC-AUC (Hateful Memes)
Optional prompt ensembling averages scores over permutations of in-context examples.
Usage
Called from the main evaluation loop for classification benchmarks.
Code Reference
- Source
- Repository https://github.com/mlfoundations/open_flamingo, File:
open_flamingo/eval/evaluate.pyLines L1118-1297
- Signature
def evaluate_classification( args: argparse.Namespace, eval_model, seed: int = 42, num_shots: int = 8, dataset_name: str = "imagenet", cached_features=None, no_kv_caching: bool = False, use_prompt_ensembling: bool = False, ) -> float: """Returns top-1 accuracy for ImageNet, ROC-AUC for Hateful Memes"""
- Import
from open_flamingo.eval.evaluate import evaluate_classification
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args | argparse.Namespace |
Yes | Eval config with dataset paths |
| eval_model | BaseEvalModel |
Yes | Model wrapper |
| seed | int |
No | Random seed (default 42) |
| num_shots | int |
No | Number of few-shot demonstrations (default 8) |
| dataset_name | str |
No | "imagenet" or "hateful_memes"
|
| no_kv_caching | bool |
No | Disable KV cache optimization |
| use_prompt_ensembling | bool |
No | Average over demo permutations |
| cached_features | Tensor |
No | RICES features |
Outputs
| Type | Description |
|---|---|
float |
Top-1 accuracy for ImageNet or ROC-AUC for Hateful Memes |
Usage Examples
# Evaluate on ImageNet with 8-shot demonstrations
accuracy = evaluate_classification(
args=args,
eval_model=eval_model,
seed=42,
num_shots=8,
dataset_name="imagenet",
no_kv_caching=False,
use_prompt_ensembling=False,
)
print(f"ImageNet top-1 accuracy: {accuracy:.4f}")
Related Pages
Principle:Mlfoundations_Open_flamingo_Classification_Evaluation