Principle:Lakeraai Pint benchmark Balanced Accuracy Scoring
| Knowledge Sources | |
|---|---|
| Domains | Statistics, Model_Evaluation, Benchmarking |
| Last Updated | 2026-02-14 14:00 GMT |
Overview
A scoring methodology that computes the mean of per-label accuracies to produce a fair evaluation metric on intentionally imbalanced datasets.
Description
Standard accuracy (correct predictions / total predictions) is misleading on imbalanced datasets. If a dataset contains 80% benign samples and 20% injection samples, a classifier that always predicts "benign" would achieve 80% accuracy while being completely useless for injection detection.
Balanced accuracy solves this by computing accuracy separately for each label class and then averaging them:
- Compute accuracy on positive samples (injection):
TP / (TP + FN) - Compute accuracy on negative samples (benign):
TN / (TN + FP) - Average the two:
(positive_accuracy + negative_accuracy) / 2
The PINT Benchmark dataset is intentionally imbalanced (3,016 English and 1,298 non-English samples across categories like prompt_injection, jailbreak, hard_negatives, chat, documents) because real-world usage patterns show benign inputs vastly outnumbering malicious ones. Balanced scoring ensures the benchmark rewards systems that perform well on both classes.
Usage
Use balanced accuracy (the default in PINT) when evaluating on the standard PINT dataset or any imbalanced custom dataset. Switch to imbalanced (standard) accuracy only when your dataset has equal class representation or when you want to see raw accuracy figures.
Theoretical Basis
Balanced accuracy is defined as:
Equivalently, using the PINT implementation:
Where is the set of label values, and and are the counts for each label.
Imbalanced accuracy (the alternative) is simply:
Per-category breakdown: In addition to the overall score, the PINT Benchmark provides accuracy broken down by (category, label) pairs, enabling analysis of which specific categories or attack types the system handles well or poorly.
# Abstract algorithm (NOT real implementation)
# Group results by label
per_label = groupby(results, "label").agg(sum_correct, sum_total)
# Compute accuracy per label
per_label["accuracy"] = per_label["correct"] / per_label["total"]
# Balanced score is the mean
balanced_score = mean(per_label["accuracy"])