Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Huggingface Transformers Label Smoothing Multi Label Warning

From Leeroopedia
Knowledge Sources
Domains Training, Loss_Functions, Troubleshooting
Last Updated 2026-02-13 20:00 GMT

Overview

Label smoothing is automatically disabled for multi-label classification tasks because it is mathematically incompatible with binary cross-entropy loss.

Description

When label_smoothing_factor is set to a non-zero value in TrainingArguments, the Trainer checks whether the model's problem_type is "multi_label_classification". If so, it emits a warning and disables label smoothing. This is because label smoothing modifies the target distribution by redistributing probability mass from the correct class to all classes, which is meaningful for single-label softmax classification but nonsensical for multi-label tasks where each label is an independent binary decision.

Usage

Be aware of this when configuring training for multi-label classification tasks (e.g., multi-label text classification, multi-label image tagging). If you set label_smoothing_factor > 0, it will be silently disabled for multi-label problems.

The Insight (Rule of Thumb)

  • Action: Do not set label_smoothing_factor for multi-label classification. It will be ignored.
  • Value: label_smoothing_factor=0 for multi-label; 0.1 is common for single-label.
  • Trade-off: None; this is a correctness guard, not a performance choice.

Reasoning

Label smoothing works by replacing one-hot targets [0, 0, 1, 0] with smoothed targets [0.033, 0.033, 0.9, 0.033] (for smoothing=0.1). This makes sense when exactly one class is correct (softmax output). In multi-label classification, each label is independently binary (sigmoid output), so there is no single "correct class" to smooth from. Applying smoothing would incorrectly reduce the target probability for genuinely positive labels and increase it for genuinely negative labels.

Code Evidence

Warning from src/transformers/trainer.py:515-520:

if self.args.label_smoothing_factor != 0:
    if getattr(self.model.config, "problem_type", None) == "multi_label_classification":
        warnings.warn(
            "Label smoothing is not compatible with multi-label classification. "
            "Disabling label smoothing for this training run.",
            UserWarning,
        )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment