Implementation:Openai CLIP Accuracy Function
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Classification, Vision |
| Last Updated | 2026-02-13 22:00 GMT |
Overview
Pattern documentation for the accuracy() helper function and evaluation loop used to benchmark CLIP zero-shot classification with top-K metrics.
Description
The accuracy() function is a user-defined pattern (not part of the CLIP package) demonstrated in the Prompt Engineering notebook (cell 17). It computes top-K accuracy for a batch of logits against ground truth labels using torch.topk(). The evaluation loop (cell 18) iterates over a DataLoader, computes image features, multiplies by zero-shot classifier weights with a 100.0 temperature scalar, and accumulates running accuracy statistics.
Usage
Use this function after constructing zero-shot classifier weights (Zeroshot_Classifier step) and preparing a test dataset DataLoader. It produces top-1 and top-5 accuracy percentages.
Code Reference
Source Location
- Repository: OpenAI CLIP
- File: notebooks/Prompt_Engineering_for_ImageNet.ipynb (cell 17: accuracy function, cell 18: evaluation loop)
Interface Specification
def accuracy(output: torch.Tensor, target: torch.Tensor, topk: tuple = (1,)) -> List[float]:
"""Compute top-K accuracy for the given predictions and targets.
Parameters
----------
output : torch.Tensor
Logits or scores, shape [B, num_classes].
target : torch.Tensor
Ground truth class indices, shape [B].
topk : tuple of int
Which top-K accuracies to compute (e.g., (1, 5)).
Returns
-------
List[float]
Accuracy percentages for each K value.
"""
pred = output.topk(max(topk), 1, True, True)[1].t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
result = []
for k in topk:
correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
result.append(correct_k.mul_(100.0 / target.size(0)))
return result
Import
# User-defined function — no package import
# Requires: torch
import torch
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| output | torch.Tensor | Yes | Logits or scores, shape [B, num_classes]. Computed as 100.0 * image_features @ zeroshot_weights |
| target | torch.Tensor | Yes | Ground truth class indices, shape [B] |
| topk | tuple of int | No | Which top-K accuracies to compute. Default: (1,) |
Outputs
| Name | Type | Description |
|---|---|---|
| result | List[torch.Tensor] | Top-K accuracy percentages for each K value. Each element is a scalar tensor. |
Usage Examples
Complete Evaluation Loop
import clip
import torch
from tqdm import tqdm
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)
# Assume zeroshot_weights already constructed (shape [512, 1000])
# Assume loader is a DataLoader over ImageNet/ImageNetV2 with preprocess
def accuracy(output, target, topk=(1,)):
pred = output.topk(max(topk), 1, True, True)[1].t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
return [
correct[:k].reshape(-1).float().sum(0, keepdim=True).mul_(100.0 / target.size(0))
for k in topk
]
# Evaluation loop
with torch.no_grad():
top1, top5, n = 0.0, 0.0, 0.0
for images, target in tqdm(loader):
images = images.to(device)
target = target.to(device)
# Get image features
image_features = model.encode_image(images)
image_features /= image_features.norm(dim=-1, keepdim=True)
# Compute logits with temperature
logits = 100.0 * image_features @ zeroshot_weights
# Compute accuracy
acc1, acc5 = accuracy(logits, target, topk=(1, 5))
# Accumulate
top1 += acc1[0] * images.size(0)
top5 += acc5[0] * images.size(0)
n += images.size(0)
top1 = (top1 / n).item()
top5 = (top5 / n).item()
print(f"Top-1 accuracy: {top1:.2f}%")
print(f"Top-5 accuracy: {top5:.2f}%")
# Expected for ViT-B/32 on ImageNetV2: ~55.93% top-1, ~83.36% top-5