Implementation:Openai CLIP Clip Load

Knowledge Sources	OpenAI CLIP Learning Transferable Visual Models From Natural Language Supervision
Domains	Vision, NLP, Transfer_Learning
Last Updated	2026-02-13 22:00 GMT

Overview

Concrete tool for loading pretrained CLIP models provided by the OpenAI CLIP library.

Description

The clip.load() function is the primary entry point for obtaining a ready-to-use CLIP model. It handles the full lifecycle: resolving a model name to a download URL, downloading the checkpoint with SHA-256 integrity verification, loading the weights as either a JIT archive or a raw state dictionary, constructing the appropriate architecture (ModifiedResNet or VisionTransformer) via build_model(), placing the model on the target device, and returning both the model and a matched preprocessing transform.

The function supports 9 pretrained model variants: RN50, RN101, RN50x4, RN50x16, RN50x64, ViT-B/32, ViT-B/16, ViT-L/14, and ViT-L/14@336px. It also accepts a local file path to a custom checkpoint.

Usage

Import and call this function as the first step in any CLIP workflow. It is required before performing zero-shot classification, feature extraction, linear probing, or any other task that uses CLIP embeddings.

Code Reference

Source Location

Repository: OpenAI CLIP
File: clip/clip.py
Lines: L94-202

Signature

def load(
    name: str,
    device: Union[str, torch.device] = "cuda" if torch.cuda.is_available() else "cpu",
    jit: bool = False,
    download_root: str = None
) -> Tuple[torch.nn.Module, Callable[[PIL.Image.Image], torch.Tensor]]:
    """Load a CLIP model.

    Parameters
    ----------
    name : str
        A model name listed by clip.available_models(), or the path to a
        model checkpoint containing the state_dict. Available models:
        RN50, RN101, RN50x4, RN50x16, RN50x64, ViT-B/32, ViT-B/16,
        ViT-L/14, ViT-L/14@336px.

    device : Union[str, torch.device]
        The device to put the loaded model. Default: "cuda" if available,
        else "cpu".

    jit : bool
        Whether to load the optimized JIT model or more hackable non-JIT
        model (default: False).

    download_root : str
        Path to download the model files; by default uses
        "~/.cache/clip".

    Returns
    -------
    model : torch.nn.Module
        The CLIP model in eval mode.

    preprocess : Callable[[PIL.Image.Image], torch.Tensor]
        A torchvision transform that converts a PIL image into a tensor
        that the returned model can take as its input.
    """

Import

import clip
# or
from clip import load

I/O Contract

Inputs

Name	Type	Required	Description
name	str	Yes	Model name from clip.available_models() (e.g. "ViT-B/32") or path to a local checkpoint file
device	Union[str, torch.device]	No	Target device for the model. Default: "cuda" if available, else "cpu"
jit	bool	No	Whether to load the JIT-traced model variant. Default: False
download_root	str	No	Directory for downloading model files. Default: ~/.cache/clip

Outputs

Name	Type	Description
model	torch.nn.Module (CLIP)	The CLIP dual-encoder model in eval mode, placed on the specified device. Contains encode_image(), encode_text(), and forward() methods.
preprocess	torchvision.transforms.Compose	Image preprocessing pipeline: Resize(n_px, bicubic) -> CenterCrop(n_px) -> RGB conversion -> ToTensor -> Normalize(mean=[0.48145466, 0.4578275, 0.40821073], std=[0.26862954, 0.26130258, 0.27577711]). The resolution n_px is derived from model.visual.input_resolution.

Usage Examples

Basic Model Loading

import clip
import torch

# List available models
print(clip.available_models())
# ['RN50', 'RN101', 'RN50x4', 'RN50x16', 'RN50x64',
#  'ViT-B/32', 'ViT-B/16', 'ViT-L/14', 'ViT-L/14@336px']

# Load a ViT-B/32 model on GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

# model is ready for inference
# preprocess is a torchvision.transforms.Compose pipeline

Loading on CPU

import clip

# Force CPU loading (model weights converted to float32)
model, preprocess = clip.load("RN50", device="cpu")

Loading from Local Checkpoint

import clip

# Load from a previously downloaded checkpoint
model, preprocess = clip.load("/path/to/ViT-B-32.pt", device="cuda")

Related Pages

Implements Principle

Principle:Openai_CLIP_Model_Loading

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment