Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Openai CLIP Python Dependencies

From Leeroopedia
Knowledge Sources
Domains Infrastructure, NLP, Computer_Vision
Last Updated 2026-02-13 22:00 GMT

Overview

Python package dependencies (ftfy, regex, packaging, tqdm, scikit-learn, Pillow) required beyond PyTorch for CLIP tokenization, text cleaning, image I/O, and linear-probe evaluation.

Description

CLIP relies on several auxiliary Python packages beyond the core PyTorch stack. The ftfy library fixes Unicode text encoding issues before tokenization. The regex library (not the stdlib `re`) provides Unicode property escapes (`\p{L}`, `\p{N}`) used by the BPE tokenizer pattern. The packaging library is used for version comparison checks. The tqdm library provides download progress bars. For linear-probe workflows, scikit-learn and numpy are additionally required. Pillow (PIL) is needed for image loading and format conversion.

Usage

Use this environment specification alongside the PyTorch CUDA Runtime environment. The `ftfy`, `regex`, and `packaging` dependencies are required for all CLIP workflows (they are imported at module load time). The `scikit-learn` dependency is only required for the linear-probe evaluation workflow.

System Requirements

Category Requirement Notes
Python 3.8+ Same as PyTorch CUDA Runtime
Disk ~50MB For auxiliary package installations

Dependencies

Python Packages (Core)

  • `ftfy` (text encoding repair, used by `simple_tokenizer.py`)
  • `regex` (Unicode-aware regex, used by `simple_tokenizer.py`)
  • `packaging` (version parsing, used by `clip.py`)
  • `tqdm` (progress bars for model download)
  • `Pillow` (PIL image handling)

Python Packages (Linear Probe Workflow)

  • `scikit-learn` (LogisticRegression for linear-probe classification)
  • `numpy` (feature array manipulation)

Python Packages (Development)

  • `pytest` (test runner, listed in `setup.py` extras_require)

Credentials

No credentials required for any of these packages.

Quick Install

# Core dependencies (required for all CLIP usage)
pip install ftfy regex tqdm packaging

# Install CLIP package itself
pip install git+https://github.com/openai/CLIP.git

# Additional for linear-probe evaluation
pip install scikit-learn numpy

# Development dependencies
pip install pytest

Code Evidence

Core imports in `clip/clip.py:1-11`:

import hashlib
import os
import urllib
import warnings
from packaging import version
from typing import Union, List

import torch
from PIL import Image
from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize
from tqdm import tqdm

ftfy and regex usage in `clip/simple_tokenizer.py:1-7`:

import gzip
import html
import os
from functools import lru_cache

import ftfy
import regex as re

Unicode property escapes in BPE pattern from `clip/simple_tokenizer.py:78`:

self.pat = re.compile(r"""<\|startoftext\|>|<\|endoftext\|>|'s|'t|'re|'ve|'m|'ll|'d|[\p{L}]+|[\p{N}]|[^\s\p{L}\p{N}]+""", re.IGNORECASE)

requirements.txt (all 6 declared dependencies):

ftfy
packaging
regex
tqdm
torch
torchvision

setup.py extras for development from `setup.py:20`:

extras_require={'dev': ['pytest']},

PyTorch Hub dependencies from `hubconf.py:5`:

dependencies = ["torch", "torchvision", "ftfy", "regex", "tqdm"]

Common Errors

Error Message Cause Solution
`ModuleNotFoundError: No module named 'ftfy'` ftfy not installed `pip install ftfy`
`ModuleNotFoundError: No module named 'regex'` regex package not installed (stdlib `re` is not sufficient) `pip install regex`
`ImportError: cannot import name 'packaging'` packaging not installed `pip install packaging`
`ModuleNotFoundError: No module named 'sklearn'` scikit-learn not installed (needed for linear probe) `pip install scikit-learn`

Compatibility Notes

  • regex vs re: CLIP uses the `regex` package (not stdlib `re`) because BPE tokenization requires Unicode property escapes (`\p{L}`, `\p{N}`) which stdlib `re` does not support.
  • ftfy: Used for `fix_text()` to handle malformed Unicode in input text. This is called on every text input during tokenization via `basic_clean()`.
  • packaging: Added for compatibility with `setuptools>=70.0.0` which no longer bundles `packaging` (see commit `dcba3cb`).
  • scikit-learn: Only needed for the linear-probe evaluation pattern; not a core CLIP dependency and not listed in `requirements.txt`.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment