Principle:Norrrrrrr lyn WAInjectBench Detection Rate Computation
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Security, Statistics |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
A binary classification evaluation metric that computes True Positive Rate and False Positive Rate from detector outputs against labeled benchmark data.
Description
Detection Rate Computation is the core evaluation metric in prompt injection detection. It measures two complementary quantities:
- True Positive Rate (TPR): The fraction of malicious samples correctly flagged by the detector. Also known as recall or sensitivity.
- False Positive Rate (FPR): The fraction of benign samples incorrectly flagged as malicious.
An ideal detector achieves TPR close to 1.0 and FPR close to 0.0. The WAInjectBench benchmark evaluates each detector on a per-file (text) or per-folder (image) basis, computing these rates for each scenario independently to provide granular performance analysis.
Usage
Use this metric whenever evaluating binary detection performance. It is computed inline within the process_file (text) and process_folder (image) functions, and again in the ensemble aggregation step.
Theoretical Basis
Where detect_ids is the set of sample IDs flagged by the detector, and total_num is the total number of samples in the file/folder. The metric type (TPR vs FPR) is determined by the ground-truth label of the data source (malicious vs benign directory).
# Pseudocode for detection rate computation
rate = len(detect_ids) / total_num if total_num > 0 else 0.0
rate = round(rate, 4) # 4 decimal precision
metric_name = "tpr" if is_malicious else "fpr"