Principle:Lakeraai Pint benchmark Custom Eval Function Interface
| Knowledge Sources | |
|---|---|
| Domains | Software_Design, Prompt_Injection, Model_Evaluation |
| Last Updated | 2026-02-14 14:00 GMT |
Overview
A callback interface pattern that decouples the benchmark execution loop from the specific prompt injection detection system being evaluated.
Description
The Custom Eval Function Interface defines a contract: any prompt injection detection system can be benchmarked by implementing a single function with the signature eval_function(prompt: str) -> bool. This function accepts a text string and returns True if prompt injection is detected, False otherwise.
This pattern solves the problem of heterogeneous detection system APIs. Prompt injection detectors vary widely:
- API-based services (Lakera Guard, AWS Bedrock Guardrails, Azure AI Content Safety) require HTTP calls with different authentication and response formats.
- Local models (HuggingFace transformers, SetFit) require model loading and inference.
- SDK-based tools (WhyLabs LangKit) have their own Python APIs with unique output schemas.
By standardizing the interface to a single (str) -> bool function, the benchmark runner can evaluate any system without modification.
Usage
Use this pattern when integrating a new prompt injection detection system into the PINT Benchmark. You must write a wrapper function that calls your system's API or SDK and returns a boolean. This function is then passed as the eval_function parameter to pint_benchmark().
Theoretical Basis
The pattern follows the Strategy design pattern (also known as the Callback pattern):
# Abstract interface specification (NOT real implementation)
def eval_function(prompt: str) -> bool:
"""
Contract:
- Input: A single text string (the prompt to evaluate)
- Output: True if prompt injection detected, False otherwise
- Side effects: None required (may make API calls internally)
- Error handling: Should raise exceptions on failure (not return None)
"""
...
The interface is deliberately minimal: a single string input and a single boolean output. This simplicity ensures that:
- Any system can be adapted with minimal wrapper code
- The benchmark loop does not need to know about authentication, response parsing, or model internals
- New detection systems can be added without modifying the benchmark code
Practical Guide
Step 1: Identify Your System's Detection Method
Determine how your system classifies prompts:
- API endpoint returning a JSON response with a flag field
- SDK function returning a score or label
- Local model returning predictions
Step 2: Write the Wrapper Function
Create a function that:
- Calls your system with the prompt string
- Parses the response to extract the injection detection result
- Returns
Truefor injection,Falsefor benign
Step 3: Handle Thresholds
If your system returns a score rather than a boolean, apply a threshold:
- Common threshold:
> 0.5for injection - Adjust based on your system's calibration
Step 4: Pass to pint_benchmark
pint_benchmark(
df=df,
eval_function=your_eval_function,
model_name="Your System Name",
)