Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Openai CLIP Top K Accuracy Evaluation

From Leeroopedia
Knowledge Sources
Domains Evaluation, Classification, Vision
Last Updated 2026-02-13 22:00 GMT

Overview

An evaluation protocol that measures classification performance by computing the percentage of test samples whose true label appears among the model's top-K highest-scoring predictions.

Description

Top-K Accuracy Evaluation is the standard metric for assessing image classification systems, particularly on large-scale benchmarks like ImageNet. For each test image, the model produces a ranking of all classes by their predicted scores. Top-1 accuracy checks if the highest-scoring class matches the true label. Top-5 accuracy checks if the true label appears anywhere in the top 5 predictions.

In the CLIP prompt-engineering workflow, the evaluation pipeline consists of:

  1. Logit computation: Multiply L2-normalized image features by the zero-shot classifier weight matrix with a temperature scalar (100.0) to produce per-class logits.
  2. Top-K extraction: Use torch.topk() to find the K highest-scoring class indices for each image.
  3. Correctness check: Compare the top-K predictions against the ground truth labels.
  4. Accuracy aggregation: Average the correctness across all test images, computing running means per batch.

The temperature scalar of 100.0 applied to the cosine similarity logits is standard in CLIP evaluation and matches the learned logit_scale from training.

Usage

Use this principle when benchmarking CLIP zero-shot classification performance on standard datasets. Report both top-1 and top-5 accuracy for comparison with published results.

Theoretical Basis

Top-K accuracy measures classification quality with increasing leniency:

# Top-K accuracy computation
# logits: [B, num_classes] = image_features @ zeroshot_weights
# target: [B] = ground truth class indices

# 1. Find top-K predictions
_, pred = logits.topk(K, dim=1, largest=True, sorted=True)
# pred: [B, K]

# 2. Check if true label is in top-K
pred = pred.t()           # [K, B]
correct = pred.eq(target.view(1, -1).expand_as(pred))  # [K, B] boolean

# 3. Compute accuracy for each K value
for k in [1, 5]:
    correct_k = correct[:k].reshape(-1).float().sum()
    accuracy_k = correct_k / batch_size * 100.0

Expected results (CLIP paper, ViT-B/32 on ImageNetV2 with 80-template ensemble):

  • Top-1: ~55.93%
  • Top-5: ~83.36%

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment