Principle:Liu00222 Open Prompt Injection Evaluation Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Prompt_Injection, Metrics |
| Last Updated | 2026-02-14 15:00 GMT |
Overview
A comprehensive metrics evaluation framework that quantifies prompt injection attack effectiveness through four complementary metrics: PNA-T, PNA-I, ASV, and MR.
Description
The Evaluation Pipeline computes four metrics that together provide a complete picture of prompt injection attack impact:
- PNA-T (Prediction Accuracy on Target Task): How well the application performs its original target task under attack. High PNA-T means the target task is still functional.
- PNA-I (Prediction Accuracy on Injected Task): How well the model performs the injected task when directly prompted (baseline for attacker capability). High PNA-I means the model can do the injected task.
- ASV (Attack Success Value): How successfully the attack causes the model to perform the injected task instead of the target task. High ASV means the attack succeeds.
- MR (Matching Rate): How closely attack responses match the injected task baseline responses. High MR means the model behaves consistently under attack.
Usage
Use this principle at the end of an experiment pipeline after collecting target task responses, injected task responses, and attack responses. The four metrics together determine whether an attack is effective and whether defenses mitigate it.
Theoretical Basis
The metrics are computed as accuracy or similarity scores over paired response-label or response-response comparisons:
For classification tasks, `eval` is exact match after normalization. For generation tasks (gigaword), ROUGE-1 F-score is used. For grammar correction (jfleg), GLEU score is used.