Environment:Openai CLIP PyTorch CUDA Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Computer_Vision |
| Last Updated | 2026-02-13 22:00 GMT |
Overview
Linux or macOS environment with PyTorch >= 1.7.1, optional CUDA GPU support, and torchvision for running OpenAI CLIP models.
Description
This environment provides the core runtime for loading and running CLIP models. It requires PyTorch 1.7.1 or later with matching torchvision. When a CUDA-capable GPU is available, CLIP automatically places the model on GPU and runs inference in fp16 (half precision); on CPU, the model is cast to fp32. The CI matrix tests against PyTorch 1.7.1, 1.9.1, and 1.10.1 on Python 3.8 with CPU-only builds.
Usage
Use this environment for all CLIP workflows: zero-shot image classification, linear-probe evaluation, and prompt-engineered classification. Every Implementation page in this wiki requires this runtime as the base layer.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu recommended) or macOS | CI uses `ubuntu-latest` |
| Hardware | NVIDIA GPU (optional) | CUDA acceleration; CPU fallback supported |
| Hardware | GPU VRAM varies by model | ViT-B/32 ~338MB weights; ViT-L/14@336px ~900MB weights |
| Disk | ~2GB free | For model cache in `~/.cache/clip` |
| Python | 3.8+ | CI tests on Python 3.8 |
Dependencies
System Packages
- CUDA toolkit (optional, for GPU acceleration)
- `conda` or `pip` (package manager)
Python Packages
- `torch` >= 1.7.1 (warning emitted if older)
- `torchvision` >= 0.8.2 (must match torch version)
- `numpy` (transitive dependency of torch)
- `Pillow` >= 5.3.0 (transitive dependency of torchvision)
Credentials
No credentials or API keys are required. Model weights are downloaded from public Azure CDN endpoints (`openaipublic.azureedge.net`) without authentication.
Quick Install
# GPU install (CUDA 11.0 example)
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
# CPU-only install
conda install --yes -c pytorch pytorch=1.7.1 torchvision cpuonly
# Or via pip (auto-selects CUDA if available)
pip install torch torchvision
Code Evidence
PyTorch version check from `clip/clip.py:23-24`:
if version.parse(torch.__version__) < version.parse("1.7.1"):
warnings.warn("PyTorch version 1.7.1 or higher is recommended")
Automatic CUDA/CPU device selection from `clip/clip.py:94`:
def load(name: str, device: Union[str, torch.device] = "cuda" if torch.cuda.is_available() else "cpu", jit: bool = False, download_root: str = None):
Dtype handling for CPU vs GPU from `clip/clip.py:140-141`:
if str(device) == "cpu":
model.float()
Torch version-conditional dtype in tokenize from `clip/clip.py:231-234`:
if version.parse(torch.__version__) < version.parse("1.8.0"):
result = torch.zeros(len(all_tokens), context_length, dtype=torch.long)
else:
result = torch.zeros(len(all_tokens), context_length, dtype=torch.int)
CI test matrix from `.github/workflows/test.yml:13-25`:
matrix:
python-version: [3.8]
pytorch-version: [1.7.1, 1.9.1, 1.10.1]
include:
- python-version: 3.8
pytorch-version: 1.7.1
torchvision-version: 0.8.2
- python-version: 3.8
pytorch-version: 1.9.1
torchvision-version: 0.10.1
- python-version: 3.8
pytorch-version: 1.10.1
torchvision-version: 0.11.2
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `UserWarning: PyTorch version 1.7.1 or higher is recommended` | PyTorch version below 1.7.1 | Upgrade PyTorch: `pip install torch>=1.7.1` |
| `RuntimeError: Model {name} not found` | Invalid model name passed to `clip.load()` | Use one of `clip.available_models()`: RN50, RN101, RN50x4, RN50x16, RN50x64, ViT-B/32, ViT-B/16, ViT-L/14, ViT-L/14@336px |
| `RuntimeError: Model has been downloaded but the SHA256 checksum does not not match` | Corrupt or incomplete download | Delete the cached file in `~/.cache/clip/` and re-download |
| `RuntimeError: {path} exists and is not a regular file` | Download target path is a directory | Remove the conflicting directory at the path |
Compatibility Notes
- CPU-only: Fully supported. Models auto-cast to fp32. JIT models require additional dtype patching (handled internally by `clip.load()`).
- CUDA GPU: Models run in fp16 by default for memory efficiency. No minimum compute capability specified.
- torchvision InterpolationMode: Older torchvision versions lack `InterpolationMode` enum; CLIP falls back to `Image.BICUBIC` from PIL (`clip/clip.py:16-20`).
- torch < 1.8.0: Tokenizer returns `LongTensor` instead of `IntTensor` due to older `index_select` requirements (`clip/clip.py:231-234`).
- PyTorch Hub: CLIP can be loaded via `torch.hub.load()` using `hubconf.py`, which declares dependencies: `torch`, `torchvision`, `ftfy`, `regex`, `tqdm`.
Related Pages
- Implementation:Openai_CLIP_Clip_Load
- Implementation:Openai_CLIP_CLIP_Encode_Image
- Implementation:Openai_CLIP_CLIP_Encode_Text
- Implementation:Openai_CLIP_CLIP_Forward
- Implementation:Openai_CLIP_CLIP_Encode_Image_For_Linear_Probe
- Implementation:Openai_CLIP_Transform
- Implementation:Openai_CLIP_Zeroshot_Classifier
- Implementation:Openai_CLIP_Accuracy_Function