Heuristic:Openai CLIP JIT Vs Non JIT Loading
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Debugging |
| Last Updated | 2026-02-13 22:00 GMT |
Overview
Trade-off between JIT-compiled CLIP models (optimized, faster inference) and non-JIT models (hackable, inspectable, modifiable).
Description
CLIP model checkpoints are distributed as TorchScript JIT archives. The `clip.load()` function offers a `jit` parameter (default `False`) that controls whether to keep the JIT-compiled graph or rebuild the model from the state dict. JIT models preserve the original computational graph with fused operations but are opaque and difficult to modify. Non-JIT models are standard `nn.Module` instances that can be inspected, fine-tuned, or modified for research purposes.
Usage
Use `jit=False` (the default) when you need to fine-tune, modify layers, extract intermediate features, or debug the model. Use `jit=True` only for production inference where maximum speed is needed and no model modification is required. The consistency test (`tests/test_consistency.py`) verifies that JIT and non-JIT models produce equivalent outputs within tolerance.
The Insight (Rule of Thumb)
- Action: Choose `jit=False` (default) for research and development; `jit=True` for optimized inference only.
- Value: `jit=False` is the default in `clip.load()`.
- Trade-off: JIT models are faster but opaque (no layer access, no gradient hooks). Non-JIT models are slower but fully hackable.
- Fallback: If a checkpoint file is not a valid JIT archive, `clip.load()` automatically falls back to non-JIT loading with a warning, even when `jit=True` was requested.
Reasoning
The CLIP docstring explicitly describes this as a choice between "the optimized JIT model or more hackable non-JIT model". JIT loading requires additional internal patching for device placement and dtype conversion (lines 144-201 of `clip.py`), which adds complexity. The non-JIT path is simpler: it calls `build_model()` to reconstruct the architecture from the state dict, then places it on the target device. The consistency test confirms both paths produce outputs within `atol=0.01, rtol=0.1`, meaning the accuracy difference is negligible.
Code Evidence
JIT parameter documentation from `clip/clip.py:106`:
jit : bool
Whether to load the optimized JIT model or more hackable non-JIT model (default).
JIT fallback with warning from `clip/clip.py:127-136`:
try:
# loading JIT archive
model = torch.jit.load(opened_file, map_location=device if jit else "cpu").eval()
state_dict = None
except RuntimeError:
# loading saved state dict
if jit:
warnings.warn(f"File {model_path} is not a JIT archive. Loading as a state dict instead")
jit = False
state_dict = torch.load(opened_file, map_location="cpu")
Consistency test from `tests/test_consistency.py:10-25`:
def test_consistency(model_name):
device = "cpu"
jit_model, transform = clip.load(model_name, device=device, jit=True)
py_model, _ = clip.load(model_name, device=device, jit=False)
# ...
assert np.allclose(jit_probs, py_probs, atol=0.01, rtol=0.1)