Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Openai CLIP Accuracy Function

From Leeroopedia
Knowledge Sources
Domains Evaluation, Classification, Vision
Last Updated 2026-02-13 22:00 GMT

Overview

Pattern documentation for the accuracy() helper function and evaluation loop used to benchmark CLIP zero-shot classification with top-K metrics.

Description

The accuracy() function is a user-defined pattern (not part of the CLIP package) demonstrated in the Prompt Engineering notebook (cell 17). It computes top-K accuracy for a batch of logits against ground truth labels using torch.topk(). The evaluation loop (cell 18) iterates over a DataLoader, computes image features, multiplies by zero-shot classifier weights with a 100.0 temperature scalar, and accumulates running accuracy statistics.

Usage

Use this function after constructing zero-shot classifier weights (Zeroshot_Classifier step) and preparing a test dataset DataLoader. It produces top-1 and top-5 accuracy percentages.

Code Reference

Source Location

  • Repository: OpenAI CLIP
  • File: notebooks/Prompt_Engineering_for_ImageNet.ipynb (cell 17: accuracy function, cell 18: evaluation loop)

Interface Specification

def accuracy(output: torch.Tensor, target: torch.Tensor, topk: tuple = (1,)) -> List[float]:
    """Compute top-K accuracy for the given predictions and targets.

    Parameters
    ----------
    output : torch.Tensor
        Logits or scores, shape [B, num_classes].

    target : torch.Tensor
        Ground truth class indices, shape [B].

    topk : tuple of int
        Which top-K accuracies to compute (e.g., (1, 5)).

    Returns
    -------
    List[float]
        Accuracy percentages for each K value.
    """
    pred = output.topk(max(topk), 1, True, True)[1].t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))

    result = []
    for k in topk:
        correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
        result.append(correct_k.mul_(100.0 / target.size(0)))
    return result

Import

# User-defined function — no package import
# Requires: torch
import torch

I/O Contract

Inputs

Name Type Required Description
output torch.Tensor Yes Logits or scores, shape [B, num_classes]. Computed as 100.0 * image_features @ zeroshot_weights
target torch.Tensor Yes Ground truth class indices, shape [B]
topk tuple of int No Which top-K accuracies to compute. Default: (1,)

Outputs

Name Type Description
result List[torch.Tensor] Top-K accuracy percentages for each K value. Each element is a scalar tensor.

Usage Examples

Complete Evaluation Loop

import clip
import torch
from tqdm import tqdm

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

# Assume zeroshot_weights already constructed (shape [512, 1000])
# Assume loader is a DataLoader over ImageNet/ImageNetV2 with preprocess

def accuracy(output, target, topk=(1,)):
    pred = output.topk(max(topk), 1, True, True)[1].t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))
    return [
        correct[:k].reshape(-1).float().sum(0, keepdim=True).mul_(100.0 / target.size(0))
        for k in topk
    ]

# Evaluation loop
with torch.no_grad():
    top1, top5, n = 0.0, 0.0, 0.0
    for images, target in tqdm(loader):
        images = images.to(device)
        target = target.to(device)

        # Get image features
        image_features = model.encode_image(images)
        image_features /= image_features.norm(dim=-1, keepdim=True)

        # Compute logits with temperature
        logits = 100.0 * image_features @ zeroshot_weights

        # Compute accuracy
        acc1, acc5 = accuracy(logits, target, topk=(1, 5))

        # Accumulate
        top1 += acc1[0] * images.size(0)
        top5 += acc5[0] * images.size(0)
        n += images.size(0)

top1 = (top1 / n).item()
top5 = (top5 / n).item()
print(f"Top-1 accuracy: {top1:.2f}%")
print(f"Top-5 accuracy: {top5:.2f}%")
# Expected for ViT-B/32 on ImageNetV2: ~55.93% top-1, ~83.36% top-5

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment