Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Liu00222 Open Prompt Injection create evaluator

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-14 15:00 GMT

Overview

Concrete factory function for creating an Evaluator that computes prompt injection attack metrics, provided by the OpenPromptInjection evaluator module.

Description

The create_evaluator function instantiates an Evaluator object that automatically computes all four metrics (PNA-T, PNA-I, ASV, MR) in its `__init__` method. It uses task-specific evaluation functions (exact match for classification, ROUGE for summarization, GLEU for grammar correction) and stores results as attributes.

Usage

Call this function at the end of an experiment after collecting all response arrays. The returned Evaluator object has `.pna_t`, `.pna_i`, `.asv`, and `.mr` attributes with computed metric values.

Code Reference

Source Location

Signature

def create_evaluator(target_task_responses, target_task,
                     injected_task_responses, injected_task,
                     attack_responses):
    """
    Factory function to create an Evaluator with computed metrics.

    Args:
        target_task_responses: List/array of target task baseline responses.
        target_task: TargetTask instance (provides labels and dataset name).
        injected_task_responses: List/array of injected task baseline responses (or None).
        injected_task: InjectedTask instance.
        attack_responses: List/array of attack responses.
    Returns:
        Evaluator: Instance with .pna_t, .pna_i, .asv, .mr attributes.
    """
    return Evaluator(target_task_responses, target_task,
                     injected_task_responses, injected_task,
                     attack_responses)

Import

import OpenPromptInjection as PI
# or
from OpenPromptInjection import create_evaluator

I/O Contract

Inputs

Name Type Required Description
target_task_responses list or ndarray Yes Baseline responses on target task (from Step 4)
target_task TargetTask Yes Target task instance with labels
injected_task_responses list, ndarray, or None Yes Baseline responses on injected task (None if defense active)
injected_task InjectedTask Yes Injected task instance with labels
attack_responses list or ndarray Yes Responses to attacked prompts (from Step 6)

Outputs

Name Type Description
evaluator.pna_t float Prediction accuracy on target task (0.0 to 1.0)
evaluator.pna_i float or None Prediction accuracy on injected task (None if no baseline)
evaluator.asv float Attack success value (0.0 to 1.0)
evaluator.mr float or None Matching rate between attack and injected baseline (None if no baseline)

Usage Examples

Complete Evaluation

import OpenPromptInjection as PI

evaluator = PI.create_evaluator(
    target_task_responses=target_task_responses,
    target_task=target_task,
    injected_task_responses=injected_task_responses,
    injected_task=attacker.task,
    attack_responses=attack_responses
)

print(f"PNA-T = {evaluator.pna_t}")   # Target task accuracy
print(f"PNA-I = {evaluator.pna_i}")   # Injected task accuracy
print(f"ASV = {evaluator.asv}")       # Attack success
print(f"MR = {evaluator.mr}")         # Matching rate

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment