Implementation:Openai CLIP Clip Load
| Knowledge Sources | |
|---|---|
| Domains | Vision, NLP, Transfer_Learning |
| Last Updated | 2026-02-13 22:00 GMT |
Overview
Concrete tool for loading pretrained CLIP models provided by the OpenAI CLIP library.
Description
The clip.load() function is the primary entry point for obtaining a ready-to-use CLIP model. It handles the full lifecycle: resolving a model name to a download URL, downloading the checkpoint with SHA-256 integrity verification, loading the weights as either a JIT archive or a raw state dictionary, constructing the appropriate architecture (ModifiedResNet or VisionTransformer) via build_model(), placing the model on the target device, and returning both the model and a matched preprocessing transform.
The function supports 9 pretrained model variants: RN50, RN101, RN50x4, RN50x16, RN50x64, ViT-B/32, ViT-B/16, ViT-L/14, and ViT-L/14@336px. It also accepts a local file path to a custom checkpoint.
Usage
Import and call this function as the first step in any CLIP workflow. It is required before performing zero-shot classification, feature extraction, linear probing, or any other task that uses CLIP embeddings.
Code Reference
Source Location
- Repository: OpenAI CLIP
- File: clip/clip.py
- Lines: L94-202
Signature
def load(
name: str,
device: Union[str, torch.device] = "cuda" if torch.cuda.is_available() else "cpu",
jit: bool = False,
download_root: str = None
) -> Tuple[torch.nn.Module, Callable[[PIL.Image.Image], torch.Tensor]]:
"""Load a CLIP model.
Parameters
----------
name : str
A model name listed by clip.available_models(), or the path to a
model checkpoint containing the state_dict. Available models:
RN50, RN101, RN50x4, RN50x16, RN50x64, ViT-B/32, ViT-B/16,
ViT-L/14, ViT-L/14@336px.
device : Union[str, torch.device]
The device to put the loaded model. Default: "cuda" if available,
else "cpu".
jit : bool
Whether to load the optimized JIT model or more hackable non-JIT
model (default: False).
download_root : str
Path to download the model files; by default uses
"~/.cache/clip".
Returns
-------
model : torch.nn.Module
The CLIP model in eval mode.
preprocess : Callable[[PIL.Image.Image], torch.Tensor]
A torchvision transform that converts a PIL image into a tensor
that the returned model can take as its input.
"""
Import
import clip
# or
from clip import load
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | str | Yes | Model name from clip.available_models() (e.g. "ViT-B/32") or path to a local checkpoint file |
| device | Union[str, torch.device] | No | Target device for the model. Default: "cuda" if available, else "cpu" |
| jit | bool | No | Whether to load the JIT-traced model variant. Default: False |
| download_root | str | No | Directory for downloading model files. Default: ~/.cache/clip |
Outputs
| Name | Type | Description |
|---|---|---|
| model | torch.nn.Module (CLIP) | The CLIP dual-encoder model in eval mode, placed on the specified device. Contains encode_image(), encode_text(), and forward() methods. |
| preprocess | torchvision.transforms.Compose | Image preprocessing pipeline: Resize(n_px, bicubic) -> CenterCrop(n_px) -> RGB conversion -> ToTensor -> Normalize(mean=[0.48145466, 0.4578275, 0.40821073], std=[0.26862954, 0.26130258, 0.27577711]). The resolution n_px is derived from model.visual.input_resolution. |
Usage Examples
Basic Model Loading
import clip
import torch
# List available models
print(clip.available_models())
# ['RN50', 'RN101', 'RN50x4', 'RN50x16', 'RN50x64',
# 'ViT-B/32', 'ViT-B/16', 'ViT-L/14', 'ViT-L/14@336px']
# Load a ViT-B/32 model on GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)
# model is ready for inference
# preprocess is a torchvision.transforms.Compose pipeline
Loading on CPU
import clip
# Force CPU loading (model weights converted to float32)
model, preprocess = clip.load("RN50", device="cpu")
Loading from Local Checkpoint
import clip
# Load from a previously downloaded checkpoint
model, preprocess = clip.load("/path/to/ViT-B-32.pt", device="cuda")