Heuristic:Openai CLIP JIT Vs Non JIT Loading

Knowledge Sources	OpenAI CLIP CLIP load() docstring and implementation
Domains	Optimization, Debugging
Last Updated	2026-02-13 22:00 GMT

Overview

Trade-off between JIT-compiled CLIP models (optimized, faster inference) and non-JIT models (hackable, inspectable, modifiable).

Description

CLIP model checkpoints are distributed as TorchScript JIT archives. The `clip.load()` function offers a `jit` parameter (default `False`) that controls whether to keep the JIT-compiled graph or rebuild the model from the state dict. JIT models preserve the original computational graph with fused operations but are opaque and difficult to modify. Non-JIT models are standard `nn.Module` instances that can be inspected, fine-tuned, or modified for research purposes.

Usage

Use `jit=False` (the default) when you need to fine-tune, modify layers, extract intermediate features, or debug the model. Use `jit=True` only for production inference where maximum speed is needed and no model modification is required. The consistency test (`tests/test_consistency.py`) verifies that JIT and non-JIT models produce equivalent outputs within tolerance.

The Insight (Rule of Thumb)

Action: Choose `jit=False` (default) for research and development; `jit=True` for optimized inference only.
Value: `jit=False` is the default in `clip.load()`.
Trade-off: JIT models are faster but opaque (no layer access, no gradient hooks). Non-JIT models are slower but fully hackable.
Fallback: If a checkpoint file is not a valid JIT archive, `clip.load()` automatically falls back to non-JIT loading with a warning, even when `jit=True` was requested.

Reasoning

The CLIP docstring explicitly describes this as a choice between "the optimized JIT model or more hackable non-JIT model". JIT loading requires additional internal patching for device placement and dtype conversion (lines 144-201 of `clip.py`), which adds complexity. The non-JIT path is simpler: it calls `build_model()` to reconstruct the architecture from the state dict, then places it on the target device. The consistency test confirms both paths produce outputs within `atol=0.01, rtol=0.1`, meaning the accuracy difference is negligible.

Code Evidence

JIT parameter documentation from `clip/clip.py:106`:

jit : bool
    Whether to load the optimized JIT model or more hackable non-JIT model (default).

JIT fallback with warning from `clip/clip.py:127-136`:

try:
    # loading JIT archive
    model = torch.jit.load(opened_file, map_location=device if jit else "cpu").eval()
    state_dict = None
except RuntimeError:
    # loading saved state dict
    if jit:
        warnings.warn(f"File {model_path} is not a JIT archive. Loading as a state dict instead")
        jit = False
    state_dict = torch.load(opened_file, map_location="cpu")

Consistency test from `tests/test_consistency.py:10-25`:

def test_consistency(model_name):
    device = "cpu"
    jit_model, transform = clip.load(model_name, device=device, jit=True)
    py_model, _ = clip.load(model_name, device=device, jit=False)
    # ...
    assert np.allclose(jit_probs, py_probs, atol=0.01, rtol=0.1)

Related Pages

Implementation:Openai_CLIP_Clip_Load

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment