Principle:Mlfoundations Open flamingo Evaluation Model Abstraction
Overview
Design pattern providing a unified evaluation interface that abstracts model-specific details behind standardized methods for text generation and classification scoring.
Description
The evaluation abstraction provides a BaseEvalModel interface with two core methods: get_outputs() for generating text from image-text inputs, and get_rank_classifications() for scoring class names using log-probabilities. This allows the evaluation framework to work with different model architectures (OpenFlamingo, BLIP-2) through a common interface.
The OpenFlamingo EvalModel wraps the Flamingo model with additional functionality:
- KV-cache-based classification for efficiency
- Task-specific prompt formatting (VQA, captioning, ImageNet, Hateful Memes)
- DDP distribution for multi-GPU evaluation
Usage
When setting up a model for evaluation across multiple benchmarks; provides the interface expected by evaluate_captioning, evaluate_vqa, and evaluate_classification.
Theoretical Basis
The Strategy pattern allows swapping model implementations without changing evaluation logic. The KV-cache optimization in classification avoids re-encoding shared in-context examples for each class name, computing the prompt prefix once and only varying the class completion. This provides O(1) per-class cost instead of O(n_classes) full forward passes.