Heuristic:Openai CLIP Linear Probe Regularization C

Knowledge Sources	OpenAI CLIP Learning Transferable Visual Models From Natural Language Supervision
Domains	Optimization, Computer_Vision
Last Updated	2026-02-13 22:00 GMT

Overview

The regularization parameter C=0.316 for scikit-learn LogisticRegression on CLIP features is a well-tuned default; it should be validated via hyperparameter sweep on a validation split.

Description

When performing linear-probe evaluation of CLIP features using scikit-learn's `LogisticRegression`, the inverse regularization strength `C` controls how much the classifier overfits to training features. The CLIP README example uses `C=0.316` (approximately `10^(-0.5)`), which is a logarithmic midpoint between weak and strong regularization. This value was selected through hyperparameter tuning and works well as a starting point for CLIP's high-dimensional frozen features, but the README explicitly warns it should be determined via a sweep on a validation split for each specific dataset.

Usage

Apply this heuristic when using the linear-probe evaluation workflow with CLIP frozen features and scikit-learn. Start with `C=0.316` as a baseline, then sweep over a logarithmic range (e.g., `[0.01, 0.1, 0.316, 1.0, 10.0]`) using a validation split for your specific dataset.

The Insight (Rule of Thumb)

Action: Set `C=0.316` in `LogisticRegression(random_state=0, C=0.316, max_iter=1000, verbose=1)`.
Value: `C=0.316 ≈ 10^(-0.5)`, a logarithmic midpoint.
Trade-off: Too low C = underfitting (too much regularization); too high C = overfitting. The optimal C varies by dataset and number of training samples.
Additional settings: `max_iter=1000` (enough for convergence), `random_state=0` (reproducibility).

Reasoning

CLIP features are 512-dimensional (ViT-B/32) or 768-dimensional (ViT-L/14) vectors extracted from a frozen pretrained model. These features are already highly informative, so a moderate regularization strength prevents the linear classifier from overfitting to noise in the training set while still allowing it to learn the class boundaries. The value `0.316 ≈ 1/sqrt(10)` is a common choice in logarithmic sweeps. The README's warning to "determine via a hyperparameter sweep" indicates this is a reasonable default, not a universal optimum.

Code Evidence

Linear probe setup from `README.md:184`:

classifier = LogisticRegression(random_state=0, C=0.316, max_iter=1000, verbose=1)
classifier.fit(train_features, train_labels)

README caveat from `README.md:193`:

Note that the `C` value should be determined via a hyperparameter sweep using a validation split.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment