Environment:Openai CLIP PyTorch CUDA Runtime

Knowledge Sources	OpenAI CLIP PyTorch Get Started
Domains	Infrastructure, Computer_Vision
Last Updated	2026-02-13 22:00 GMT

Overview

Linux or macOS environment with PyTorch >= 1.7.1, optional CUDA GPU support, and torchvision for running OpenAI CLIP models.

Description

This environment provides the core runtime for loading and running CLIP models. It requires PyTorch 1.7.1 or later with matching torchvision. When a CUDA-capable GPU is available, CLIP automatically places the model on GPU and runs inference in fp16 (half precision); on CPU, the model is cast to fp32. The CI matrix tests against PyTorch 1.7.1, 1.9.1, and 1.10.1 on Python 3.8 with CPU-only builds.

Usage

Use this environment for all CLIP workflows: zero-shot image classification, linear-probe evaluation, and prompt-engineered classification. Every Implementation page in this wiki requires this runtime as the base layer.

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu recommended) or macOS	CI uses `ubuntu-latest`
Hardware	NVIDIA GPU (optional)	CUDA acceleration; CPU fallback supported
Hardware	GPU VRAM varies by model	ViT-B/32 ~338MB weights; ViT-L/14@336px ~900MB weights
Disk	~2GB free	For model cache in `~/.cache/clip`
Python	3.8+	CI tests on Python 3.8

Dependencies

System Packages

CUDA toolkit (optional, for GPU acceleration)
`conda` or `pip` (package manager)

Python Packages

`torch` >= 1.7.1 (warning emitted if older)
`torchvision` >= 0.8.2 (must match torch version)
`numpy` (transitive dependency of torch)
`Pillow` >= 5.3.0 (transitive dependency of torchvision)

Credentials

No credentials or API keys are required. Model weights are downloaded from public Azure CDN endpoints (`openaipublic.azureedge.net`) without authentication.

Quick Install

# GPU install (CUDA 11.0 example)
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0

# CPU-only install
conda install --yes -c pytorch pytorch=1.7.1 torchvision cpuonly

# Or via pip (auto-selects CUDA if available)
pip install torch torchvision

Code Evidence

PyTorch version check from `clip/clip.py:23-24`:

if version.parse(torch.__version__) < version.parse("1.7.1"):
    warnings.warn("PyTorch version 1.7.1 or higher is recommended")

Automatic CUDA/CPU device selection from `clip/clip.py:94`:

def load(name: str, device: Union[str, torch.device] = "cuda" if torch.cuda.is_available() else "cpu", jit: bool = False, download_root: str = None):

Dtype handling for CPU vs GPU from `clip/clip.py:140-141`:

if str(device) == "cpu":
    model.float()

Torch version-conditional dtype in tokenize from `clip/clip.py:231-234`:

if version.parse(torch.__version__) < version.parse("1.8.0"):
    result = torch.zeros(len(all_tokens), context_length, dtype=torch.long)
else:
    result = torch.zeros(len(all_tokens), context_length, dtype=torch.int)

CI test matrix from `.github/workflows/test.yml:13-25`:

matrix:
  python-version: [3.8]
  pytorch-version: [1.7.1, 1.9.1, 1.10.1]
  include:
    - python-version: 3.8
      pytorch-version: 1.7.1
      torchvision-version: 0.8.2
    - python-version: 3.8
      pytorch-version: 1.9.1
      torchvision-version: 0.10.1
    - python-version: 3.8
      pytorch-version: 1.10.1
      torchvision-version: 0.11.2

Common Errors

Error Message	Cause	Solution
`UserWarning: PyTorch version 1.7.1 or higher is recommended`	PyTorch version below 1.7.1	Upgrade PyTorch: `pip install torch>=1.7.1`
`RuntimeError: Model {name} not found`	Invalid model name passed to `clip.load()`	Use one of `clip.available_models()`: RN50, RN101, RN50x4, RN50x16, RN50x64, ViT-B/32, ViT-B/16, ViT-L/14, ViT-L/14@336px
`RuntimeError: Model has been downloaded but the SHA256 checksum does not not match`	Corrupt or incomplete download	Delete the cached file in `~/.cache/clip/` and re-download
`RuntimeError: {path} exists and is not a regular file`	Download target path is a directory	Remove the conflicting directory at the path

Compatibility Notes

CPU-only: Fully supported. Models auto-cast to fp32. JIT models require additional dtype patching (handled internally by `clip.load()`).
CUDA GPU: Models run in fp16 by default for memory efficiency. No minimum compute capability specified.
torchvision InterpolationMode: Older torchvision versions lack `InterpolationMode` enum; CLIP falls back to `Image.BICUBIC` from PIL (`clip/clip.py:16-20`).
torch < 1.8.0: Tokenizer returns `LongTensor` instead of `IntTensor` due to older `index_select` requirements (`clip/clip.py:231-234`).
PyTorch Hub: CLIP can be loaded via `torch.hub.load()` using `hubconf.py`, which declares dependencies: `torch`, `torchvision`, `ftfy`, `regex`, `tqdm`.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment