Principle:Openai CLIP Linear Classification
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Evaluation, Classification |
| Last Updated | 2026-02-13 22:00 GMT |
Overview
A linear classification evaluation protocol that trains a simple logistic regression model on frozen pretrained features to measure representation quality.
Description
Linear Classification (also called "linear probing") is a standard evaluation methodology for pretrained visual representations. A lightweight linear classifier (logistic regression) is trained on top of frozen feature vectors extracted from a pretrained model, and its accuracy on a held-out test set is reported as a measure of how much useful task-specific information the representations contain.
The protocol consists of:
- Training: Fit a multinomial logistic regression classifier on the L2-normalized image features and corresponding labels from the training split.
- Prediction: Use the trained classifier to predict class labels for test-split features.
- Evaluation: Compute classification accuracy as the fraction of correctly predicted test samples.
Key design choices include the regularization strength C (inverse regularization), maximum iterations for convergence, and a fixed random seed for reproducibility.
Usage
Use this principle to evaluate the quality of pretrained image representations on a classification benchmark. It is the standard "linear probe" metric reported in papers like CLIP, DINO, and MAE. The representation quality is measured by how well a simple linear boundary separates classes in the embedding space.
Theoretical Basis
Logistic regression fits a linear decision boundary in the feature space by minimizing the regularized cross-entropy loss:
# Multinomial logistic regression
# minimize: -sum(log P(y_i | x_i; W, b)) + (1/C) * ||W||^2
# where P(y=k | x) = softmax(W @ x + b)[k]
# The regularization parameter C controls the trade-off:
# - Higher C: less regularization, fits training data more closely
# - Lower C: more regularization, simpler decision boundary
# CLIP README uses C=0.316 (approximately sqrt(0.1))
The key assumption is that a good pretrained representation should make classes linearly separable without needing non-linear transformations. A higher linear probe accuracy indicates that the representations have captured more task-relevant structure.