Principle:Norrrrrrr lyn WAInjectBench Detection Rate Computation

Knowledge Sources	Evaluating Prompt Injection
Domains	Evaluation, Security, Statistics
Last Updated	2026-02-14 16:00 GMT

Overview

A binary classification evaluation metric that computes True Positive Rate and False Positive Rate from detector outputs against labeled benchmark data.

Description

Detection Rate Computation is the core evaluation metric in prompt injection detection. It measures two complementary quantities:

True Positive Rate (TPR): The fraction of malicious samples correctly flagged by the detector. Also known as recall or sensitivity. $T P R = \frac{| d e t e c t e d \cap m a l i c i o u s |}{| m a l i c i o u s |}$
False Positive Rate (FPR): The fraction of benign samples incorrectly flagged as malicious. $F P R = \frac{| d e t e c t e d \cap b e n i g n |}{| b e n i g n |}$

An ideal detector achieves TPR close to 1.0 and FPR close to 0.0. The WAInjectBench benchmark evaluates each detector on a per-file (text) or per-folder (image) basis, computing these rates for each scenario independently to provide granular performance analysis.

Usage

Use this metric whenever evaluating binary detection performance. It is computed inline within the process_file (text) and process_folder (image) functions, and again in the ensemble aggregation step.

Theoretical Basis

$T P R = \frac{| D_{f l a g g e d} \cap S_{m a l i c i o u s} |}{| S_{m a l i c i o u s} |} = \frac{l e n (d e t e c t_i d s)}{t o t a l_n u m}$

$F P R = \frac{| D_{f l a g g e d} \cap S_{b e n i g n} |}{| S_{b e n i g n} |} = \frac{l e n (d e t e c t_i d s)}{t o t a l_n u m}$

Where detect_ids is the set of sample IDs flagged by the detector, and total_num is the total number of samples in the file/folder. The metric type (TPR vs FPR) is determined by the ground-truth label of the data source (malicious vs benign directory).

# Pseudocode for detection rate computation
rate = len(detect_ids) / total_num if total_num > 0 else 0.0
rate = round(rate, 4)  # 4 decimal precision
metric_name = "tpr" if is_malicious else "fpr"

Related Pages

Implemented By

Implementation:Norrrrrrr_lyn_WAInjectBench_TPR_FPR_Calculation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment