Heuristic:Huggingface Transformers Label Smoothing Multi Label Warning
| Knowledge Sources | |
|---|---|
| Domains | Training, Loss_Functions, Troubleshooting |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Label smoothing is automatically disabled for multi-label classification tasks because it is mathematically incompatible with binary cross-entropy loss.
Description
When label_smoothing_factor is set to a non-zero value in TrainingArguments, the Trainer checks whether the model's problem_type is "multi_label_classification". If so, it emits a warning and disables label smoothing. This is because label smoothing modifies the target distribution by redistributing probability mass from the correct class to all classes, which is meaningful for single-label softmax classification but nonsensical for multi-label tasks where each label is an independent binary decision.
Usage
Be aware of this when configuring training for multi-label classification tasks (e.g., multi-label text classification, multi-label image tagging). If you set label_smoothing_factor > 0, it will be silently disabled for multi-label problems.
The Insight (Rule of Thumb)
- Action: Do not set
label_smoothing_factorfor multi-label classification. It will be ignored. - Value:
label_smoothing_factor=0for multi-label;0.1is common for single-label. - Trade-off: None; this is a correctness guard, not a performance choice.
Reasoning
Label smoothing works by replacing one-hot targets [0, 0, 1, 0] with smoothed targets [0.033, 0.033, 0.9, 0.033] (for smoothing=0.1). This makes sense when exactly one class is correct (softmax output). In multi-label classification, each label is independently binary (sigmoid output), so there is no single "correct class" to smooth from. Applying smoothing would incorrectly reduce the target probability for genuinely positive labels and increase it for genuinely negative labels.
Code Evidence
Warning from src/transformers/trainer.py:515-520:
if self.args.label_smoothing_factor != 0:
if getattr(self.model.config, "problem_type", None) == "multi_label_classification":
warnings.warn(
"Label smoothing is not compatible with multi-label classification. "
"Disabling label smoothing for this training run.",
UserWarning,
)