Implementation:Openai CLIP Class Label Template Preparation
| Knowledge Sources | |
|---|---|
| Domains | NLP, Vision, Zero_Shot_Learning |
| Last Updated | 2026-02-13 22:00 GMT |
Overview
Pattern documentation for defining class name lists and prompt template collections for CLIP prompt-engineered zero-shot classification.
Description
This is a Pattern Doc documenting the user-defined data structures required for prompt-engineered classification. The CLIP repository provides 80 ImageNet-specific prompt templates in the Prompt Engineering notebook (cell 10) and references 3,401 lines of templates for 26 benchmarks in data/prompts.md. Users must define:
- imagenet_classes: A list of 1000 curated class names, modified from standard ImageNet labels for disambiguation (e.g., "tench" stays as "tench", but "nail" becomes "metal nail").
- imagenet_templates: A list of 80 prompt template strings, each containing a '{}' placeholder that will be filled with the class name.
These are pure Python data definitions with no library dependencies.
Usage
Define these data structures before constructing zero-shot classifier weights. The class names should match the target dataset's label semantics, and templates should provide diverse contextual framing.
Code Reference
Source Location
- Repository: OpenAI CLIP
- File: notebooks/Prompt_Engineering_for_ImageNet.ipynb (cells 8 and 10)
- Additional reference: data/prompts.md (3,401 lines of templates for 26 benchmarks)
Interface Specification
# Class name list: List[str]
# Each entry is a disambiguated class name
imagenet_classes: List[str] = [
"tench",
"goldfish",
"great white shark",
# ... 1000 classes total
"metal nail", # disambiguated from "nail"
"kite (bird of prey)", # disambiguated from "kite"
# ...
]
# Template list: List[str]
# Each entry contains {} as a placeholder for the class name
imagenet_templates: List[str] = [
"a bad photo of a {}.",
"a photo of many {}.",
"a sculpture of a {}.",
"a photo of the hard to see {}.",
"a low resolution photo of the {}.",
"a rendering of a {}.",
"graffiti of a {}.",
"a bad photo of the {}.",
"a cropped photo of the {}.",
"a tattoo of the {}.",
"the embroidered {}.",
"a photo of a hard to see {}.",
"a bright photo of a {}.",
"a photo of a clean {}.",
"a photo of a dirty {}.",
"a dark photo of the {}.",
"a drawing of a {}.",
"a photo of my {}.",
"the plastic {}.",
"a photo of the cool {}.",
# ... 80 templates total
"a photo of a {}.",
"itap of a {}.", # "I took a picture of a"
]
Import
# No imports required — these are user-defined Python lists
# Typically defined inline in a notebook or script
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| dataset_classes | source data | Yes | The class label vocabulary of the target dataset (e.g., ImageNet 1000 classes, CIFAR-100 classes) |
| template_collection | source data | No | Reference templates from data/prompts.md or custom-designed templates |
Outputs
| Name | Type | Description |
|---|---|---|
| classnames | List[str] | Curated, disambiguated class name strings (len = number of classes) |
| templates | List[str] | Prompt template strings with {} placeholder (len = number of templates, e.g. 80 for ImageNet) |
Usage Examples
Defining Custom Classes and Templates
# For a custom dataset with 5 animal classes
classnames = ["cat", "dog", "goldfish", "parrot", "hamster"]
# Simple templates
templates = [
"a photo of a {}.",
"a blurry photo of a {}.",
"a photo of the large {}.",
"a photo of the small {}.",
"a photo of a {} in the wild.",
]
# Generate all combinations
for classname in classnames:
texts = [template.format(classname) for template in templates]
# e.g., ["a photo of a cat.", "a blurry photo of a cat.", ...]
Using Prompts from data/prompts.md
# The CLIP repo provides templates for 26 benchmarks in data/prompts.md
# Format in prompts.md:
# ### DatasetName
# - "template with {}."
# - "another template with {}."
# For ImageNet, 80 templates are used (defined in notebook cell 10)
# For other datasets, see data/prompts.md for dataset-specific templates