Principle:Openai CLIP Linear Classification

Knowledge Sources	Learning Transferable Visual Models From Natural Language Supervision Linear Probing vs Fine-tuning scikit-learn LogisticRegression
Domains	Machine_Learning, Evaluation, Classification
Last Updated	2026-02-13 22:00 GMT

Overview

A linear classification evaluation protocol that trains a simple logistic regression model on frozen pretrained features to measure representation quality.

Description

Linear Classification (also called "linear probing") is a standard evaluation methodology for pretrained visual representations. A lightweight linear classifier (logistic regression) is trained on top of frozen feature vectors extracted from a pretrained model, and its accuracy on a held-out test set is reported as a measure of how much useful task-specific information the representations contain.

The protocol consists of:

Training: Fit a multinomial logistic regression classifier on the L2-normalized image features and corresponding labels from the training split.
Prediction: Use the trained classifier to predict class labels for test-split features.
Evaluation: Compute classification accuracy as the fraction of correctly predicted test samples.

Key design choices include the regularization strength C (inverse regularization), maximum iterations for convergence, and a fixed random seed for reproducibility.

Usage

Use this principle to evaluate the quality of pretrained image representations on a classification benchmark. It is the standard "linear probe" metric reported in papers like CLIP, DINO, and MAE. The representation quality is measured by how well a simple linear boundary separates classes in the embedding space.

Theoretical Basis

Logistic regression fits a linear decision boundary in the feature space by minimizing the regularized cross-entropy loss:

# Multinomial logistic regression
# minimize: -sum(log P(y_i | x_i; W, b)) + (1/C) * ||W||^2
# where P(y=k | x) = softmax(W @ x + b)[k]

# The regularization parameter C controls the trade-off:
# - Higher C: less regularization, fits training data more closely
# - Lower C: more regularization, simpler decision boundary
# CLIP README uses C=0.316 (approximately sqrt(0.1))

The key assumption is that a good pretrained representation should make classes linearly separable without needing non-linear transformations. A higher linear probe accuracy indicates that the representations have captured more task-relevant structure.

Related Pages

Implemented By

Implementation:Openai_CLIP_LogisticRegression_Wrapper

Uses Heuristic

Heuristic:Openai_CLIP_Linear_Probe_Regularization_C

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment