Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Norrrrrrr lyn WAInjectBench Detection Rate Computation

From Leeroopedia
Knowledge Sources
Domains Evaluation, Security, Statistics
Last Updated 2026-02-14 16:00 GMT

Overview

A binary classification evaluation metric that computes True Positive Rate and False Positive Rate from detector outputs against labeled benchmark data.

Description

Detection Rate Computation is the core evaluation metric in prompt injection detection. It measures two complementary quantities:

  • True Positive Rate (TPR): The fraction of malicious samples correctly flagged by the detector. Also known as recall or sensitivity. TPR=|detectedmalicious||malicious|
  • False Positive Rate (FPR): The fraction of benign samples incorrectly flagged as malicious. FPR=|detectedbenign||benign|

An ideal detector achieves TPR close to 1.0 and FPR close to 0.0. The WAInjectBench benchmark evaluates each detector on a per-file (text) or per-folder (image) basis, computing these rates for each scenario independently to provide granular performance analysis.

Usage

Use this metric whenever evaluating binary detection performance. It is computed inline within the process_file (text) and process_folder (image) functions, and again in the ensemble aggregation step.

Theoretical Basis

TPR=|DflaggedSmalicious||Smalicious|=len(detect_ids)total_num

FPR=|DflaggedSbenign||Sbenign|=len(detect_ids)total_num

Where detect_ids is the set of sample IDs flagged by the detector, and total_num is the total number of samples in the file/folder. The metric type (TPR vs FPR) is determined by the ground-truth label of the data source (malicious vs benign directory).

# Pseudocode for detection rate computation
rate = len(detect_ids) / total_num if total_num > 0 else 0.0
rate = round(rate, 4)  # 4 decimal precision
metric_name = "tpr" if is_malicious else "fpr"

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment