Heuristic:Norrrrrrr lyn WAInjectBench Balanced Class Weights Imbalanced Data
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Classification |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Using `class_weight="balanced"` in LogisticRegression for image embedding classifiers to handle imbalanced benign/malicious data distributions.
Description
The image embedding classifier training script (`train/embedding-i.py`) uses `class_weight="balanced"` when instantiating `LogisticRegression`. This adjusts the loss function to weight each class inversely proportional to its frequency in the training data. In prompt injection detection, benign samples often outnumber malicious ones, so without balanced weights the classifier would be biased toward predicting "benign". Notably, the text embedding classifier (`train/embedding-t.py`) does not use balanced weights, relying on the default equal weighting.
Usage
Use this heuristic when training binary classifiers on datasets with uneven class distributions, particularly when the malicious/attack class is underrepresented. It is especially important for the image modality where data collection for attack images may be limited.
The Insight (Rule of Thumb)
- Action: Set `class_weight="balanced"` in `LogisticRegression()` for image classifiers.
- Value: Sklearn auto-computes weights as `n_samples / (n_classes * np.bincount(y))`.
- Trade-off: May increase false positive rate (FPR) in exchange for higher true positive rate (TPR). The decision depends on whether recall (catching all attacks) is more important than precision (avoiding false alarms).
- Text vs Image: The text classifier omits this parameter, suggesting the text datasets are more balanced or that false positives are more costly for text detection.
Reasoning
In security detection contexts, missing an attack (false negative) is typically worse than a false alarm (false positive). Balanced class weights ensure the classifier does not simply learn to predict the majority class. The difference between the image and text training scripts suggests a deliberate design choice: image attack datasets may be smaller or harder to collect, making class imbalance a bigger problem for the image modality.
Code Evidence
Image classifier with balanced weights from `train/embedding-i.py:48-52`:
clf = LogisticRegression(
max_iter=2000,
class_weight="balanced",
n_jobs=-1
)
Text classifier without balanced weights from `train/embedding-t.py:30`:
clf = LogisticRegression(max_iter=1000)