Principle:Norrrrrrr lyn WAInjectBench Image Feature Extraction
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Feature_Engineering |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
A per-image encoding step that transforms raw image files into L2-normalized CLIP embedding vectors suitable for classifier training.
Description
Image Feature Extraction loads individual images from disk, applies the CLIP preprocessing transform, passes them through the CLIP visual encoder, and L2-normalizes the resulting embeddings. Unlike batch text encoding, image encoding is performed one-at-a-time due to varying image sizes and potential I/O errors. Failed images are replaced with zero vectors to maintain alignment with the label array.
Usage
Use this after loading training data and initializing the CLIP model. It bridges the data loading step and the classifier training step in the image embedding pipeline.
Theoretical Basis
# Per-image CLIP encoding with normalization
image = preprocess(PIL.Image.open(path)).unsqueeze(0)
with torch.no_grad():
emb = model.encode_image(image)
emb = emb / emb.norm(dim=-1, keepdim=True) # L2 normalize
L2 normalization ensures all embeddings lie on a unit hypersphere, making cosine similarity equivalent to dot product and improving classifier performance.