Implementation:Liu00222 Open Prompt Injection create evaluator
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-14 15:00 GMT |
Overview
Concrete factory function for creating an Evaluator that computes prompt injection attack metrics, provided by the OpenPromptInjection evaluator module.
Description
The create_evaluator function instantiates an Evaluator object that automatically computes all four metrics (PNA-T, PNA-I, ASV, MR) in its `__init__` method. It uses task-specific evaluation functions (exact match for classification, ROUGE for summarization, GLEU for grammar correction) and stores results as attributes.
Usage
Call this function at the end of an experiment after collecting all response arrays. The returned Evaluator object has `.pna_t`, `.pna_i`, `.asv`, and `.mr` attributes with computed metric values.
Code Reference
Source Location
- Repository: Open-Prompt-Injection
- File: OpenPromptInjection/evaluator/__init__.py
- Lines: L4-5
Signature
def create_evaluator(target_task_responses, target_task,
injected_task_responses, injected_task,
attack_responses):
"""
Factory function to create an Evaluator with computed metrics.
Args:
target_task_responses: List/array of target task baseline responses.
target_task: TargetTask instance (provides labels and dataset name).
injected_task_responses: List/array of injected task baseline responses (or None).
injected_task: InjectedTask instance.
attack_responses: List/array of attack responses.
Returns:
Evaluator: Instance with .pna_t, .pna_i, .asv, .mr attributes.
"""
return Evaluator(target_task_responses, target_task,
injected_task_responses, injected_task,
attack_responses)
Import
import OpenPromptInjection as PI
# or
from OpenPromptInjection import create_evaluator
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| target_task_responses | list or ndarray | Yes | Baseline responses on target task (from Step 4) |
| target_task | TargetTask | Yes | Target task instance with labels |
| injected_task_responses | list, ndarray, or None | Yes | Baseline responses on injected task (None if defense active) |
| injected_task | InjectedTask | Yes | Injected task instance with labels |
| attack_responses | list or ndarray | Yes | Responses to attacked prompts (from Step 6) |
Outputs
| Name | Type | Description |
|---|---|---|
| evaluator.pna_t | float | Prediction accuracy on target task (0.0 to 1.0) |
| evaluator.pna_i | float or None | Prediction accuracy on injected task (None if no baseline) |
| evaluator.asv | float | Attack success value (0.0 to 1.0) |
| evaluator.mr | float or None | Matching rate between attack and injected baseline (None if no baseline) |
Usage Examples
Complete Evaluation
import OpenPromptInjection as PI
evaluator = PI.create_evaluator(
target_task_responses=target_task_responses,
target_task=target_task,
injected_task_responses=injected_task_responses,
injected_task=attacker.task,
attack_responses=attack_responses
)
print(f"PNA-T = {evaluator.pna_t}") # Target task accuracy
print(f"PNA-I = {evaluator.pna_i}") # Injected task accuracy
print(f"ASV = {evaluator.asv}") # Attack success
print(f"MR = {evaluator.mr}") # Matching rate