Heuristic:Mlfoundations Open flamingo RICES Feature Caching
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Evaluation, Computer_Vision |
| Last Updated | 2026-02-08 03:30 GMT |
Overview
Pre-compute and cache CLIP visual features for all training set images to enable fast retrieval of in-context examples during few-shot evaluation, avoiding redundant feature extraction.
Description
RICES (Retrieval In-Context Example Selection) selects the most similar training examples as in-context demonstrations for each test query during few-shot evaluation. This requires computing CLIP features for every image in the training set, which can be extremely expensive for large datasets like COCO (100K+ images). The feature caching heuristic pre-computes all features once and stores them as `.pkl` files, which can be loaded directly for subsequent evaluation runs. At query time, similarity is computed via a simple matrix multiplication between query features and cached features.
Usage
Apply this heuristic when running Few-Shot Evaluation with the `--rices` flag. Pre-cache features using `scripts/cache_rices_features.py` and pass the directory via `--cached_demonstration_features`. This is especially important when running multiple evaluation configs (different shot counts, seeds) against the same training set.
The Insight (Rule of Thumb)
- Action: Pre-compute CLIP features for all training images using `cache_rices_features.py`. Store as `.pkl` files per dataset (e.g., `coco.pkl`, `imagenet.pkl`). Pass cached directory via `--cached_demonstration_features`.
- Value: Saves the full feature extraction time for every evaluation run after the first.
- Trade-off: Requires upfront disk space for cached features (roughly 2KB per image for ViT-L-14) and RAM to load all features at query time.
Reasoning
Feature extraction is the computational bottleneck in RICES. For COCO with ~113K training images, a full feature extraction takes significant time on a single GPU. Since the training set features do not change between evaluation runs, caching them eliminates this cost entirely. The cached features are normalized L2 vectors, so similarity computation reduces to a matrix multiplication followed by argsort.
The RICES implementation returns examples in reverse order of similarity (most similar last), which places the most relevant demonstration closest to the test query in the prompt. This follows the Flamingo paper's finding that in-context example ordering matters.
Code Evidence
Feature caching check from `open_flamingo/eval/rices.py:31-34`:
# Precompute features
if cached_features is None:
self.features = self._precompute_features()
else:
self.features = cached_features
Feature normalization from `open_flamingo/eval/rices.py:58-60`:
image_features = self.model.encode_image(inputs)
image_features /= image_features.norm(dim=-1, keepdim=True)
features.append(image_features.detach())
Similarity search and reverse ordering from `open_flamingo/eval/rices.py:86-95`:
# Compute the similarity of the input image to the precomputed features
similarity = (query_feature @ self.features.T).squeeze()
# Get the indices of the 'num_examples' most similar images
indices = similarity.argsort(dim=-1, descending=True)[:, :num_examples]
# Return with the most similar images last
return [[self.dataset[i] for i in reversed(row)] for row in indices]
Cached feature loading from `open_flamingo/eval/evaluate.py:421-425`:
if args.cached_demonstration_features is not None:
cached_features = torch.load(
f"{args.cached_demonstration_features}/flickr30.pkl", map_location="cpu"
)