Principle:Lakeraai Pint benchmark Balanced Accuracy Scoring

Knowledge Sources	A Systematic Analysis of Performance Measures for Classification Tasks PINT Benchmark
Domains	Statistics, Model_Evaluation, Benchmarking
Last Updated	2026-02-14 14:00 GMT

Overview

A scoring methodology that computes the mean of per-label accuracies to produce a fair evaluation metric on intentionally imbalanced datasets.

Description

Standard accuracy (correct predictions / total predictions) is misleading on imbalanced datasets. If a dataset contains 80% benign samples and 20% injection samples, a classifier that always predicts "benign" would achieve 80% accuracy while being completely useless for injection detection.

Balanced accuracy solves this by computing accuracy separately for each label class and then averaging them:

Compute accuracy on positive samples (injection): TP / (TP + FN)
Compute accuracy on negative samples (benign): TN / (TN + FP)
Average the two: (positive_accuracy + negative_accuracy) / 2

The PINT Benchmark dataset is intentionally imbalanced (3,016 English and 1,298 non-English samples across categories like prompt_injection, jailbreak, hard_negatives, chat, documents) because real-world usage patterns show benign inputs vastly outnumbering malicious ones. Balanced scoring ensures the benchmark rewards systems that perform well on both classes.

Usage

Use balanced accuracy (the default in PINT) when evaluating on the standard PINT dataset or any imbalanced custom dataset. Switch to imbalanced (standard) accuracy only when your dataset has equal class representation or when you want to see raw accuracy figures.

Theoretical Basis

Balanced accuracy is defined as:

$Balanced Accuracy = \frac{1}{2} (\frac{T P}{T P + F N} + \frac{T N}{T N + F P})$

Equivalently, using the PINT implementation:

$Balanced Score = \frac{1}{| L |} \sum_{l \in L} \frac{{correct}_{l}}{{total}_{l}}$

Where $L = {T r u e, F a l s e}$ is the set of label values, and ${correct}_{l}$ and ${total}_{l}$ are the counts for each label.

Imbalanced accuracy (the alternative) is simply:

$Imbalanced Accuracy = \frac{\sum correct}{\sum total}$

Per-category breakdown: In addition to the overall score, the PINT Benchmark provides accuracy broken down by (category, label) pairs, enabling analysis of which specific categories or attack types the system handles well or poorly.

# Abstract algorithm (NOT real implementation)
# Group results by label
per_label = groupby(results, "label").agg(sum_correct, sum_total)

# Compute accuracy per label
per_label["accuracy"] = per_label["correct"] / per_label["total"]

# Balanced score is the mean
balanced_score = mean(per_label["accuracy"])

Related Pages

Implemented By

Implementation:Lakeraai_Pint_benchmark_Benchmark_Results_Interpretation

Uses Heuristic

Heuristic:Lakeraai_Pint_benchmark_Use_Balanced_Accuracy_For_Imbalanced_Datasets

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment