Workflow:Lakeraai Pint benchmark Custom System Evaluation

Knowledge Sources	PINT Benchmark Guide to Prompt Injection
Domains	AI_Security, Benchmarking, Prompt_Injection_Detection
Last Updated	2026-02-14 14:00 GMT

Overview

End-to-end process for benchmarking a custom (non-Hugging Face) prompt injection detection system against the PINT dataset by writing a custom evaluation function callback.

Description

This workflow covers the procedure for evaluating any prompt injection detection system that is not a standard Hugging Face model. This includes commercial API services (such as Lakera Guard, AWS Bedrock Guardrails, Azure AI Prompt Shield, Google Model Armor), third-party libraries (such as WhyLabs LangKit), or custom in-house detection logic. The user writes a thin adapter function that accepts a single text string and returns a boolean indicating whether an injection was detected. This function is then passed to pint_benchmark() as a callback, and the benchmark handles dataset iteration, scoring, and result generation.

Usage

Execute this workflow when you want to benchmark a prompt injection detection system that does not use the standard Hugging Face text-classification pipeline. This includes API-based commercial services, custom rule-based detectors, ensemble systems, or any detection tool that exposes a programmatic interface returning injection verdicts. The custom eval function pattern is the most flexible integration path into the PINT Benchmark.

Execution Steps

Step 1: Environment Setup

Install the PINT Benchmark dependencies via Poetry and any additional libraries required by the target detection system. For API-based services, configure authentication credentials (API keys, endpoints). For library-based tools, install the relevant package in the benchmark notebook.

Key considerations:

Use Poetry for the base environment (poetry install)
Install detection system dependencies via pip in the notebook
Store API keys securely; do not commit credentials to the repository
Verify network connectivity for cloud-based detection APIs

Step 2: Define Evaluation Function

Write a custom Python function that conforms to the PINT Benchmark's callback interface. The function must accept a single str parameter (the input text to classify) and return a bool (True if prompt injection is detected, False otherwise). The function encapsulates all interaction with the detection system, including API calls, response parsing, and threshold application.

Key considerations:

Function signature must be eval_function(text: str) -> bool
Handle any necessary response parsing (e.g., confidence thresholds, label mapping)
Consider rate limiting for API-based services
The function is called once per dataset input (4,314 times for the full PINT dataset)

Pseudocode:

Define function accepting text string
Call detection system with text
Parse response to extract injection verdict
Apply threshold or label mapping
Return boolean result

Step 3: Benchmark Execution

Pass the custom evaluation function to pint_benchmark() along with a descriptive model_name string. The benchmark loads the PINT dataset, iterates over all inputs, calls the evaluation function for each, collects predictions, and computes per-category accuracy and the balanced PINT score.

Key considerations:

Provide a descriptive model_name for clear result labeling
For slow APIs, benchmark execution may require extended runtime
The benchmark uses the default PINT dataset unless a custom dataset or DataFrame is provided
Error handling in the eval function prevents individual failures from halting the full run

Step 4: Results Analysis

Review the benchmark output table showing per-category accuracy breakdown and the overall balanced PINT score. Compare results against the published leaderboard. Analyze category-specific performance to understand the system's strengths (e.g., jailbreak detection) and weaknesses (e.g., false positive rate on hard negatives).

Key considerations:

The balanced score weights all categories equally
High hard_negatives accuracy indicates low false positive rate
Results should be verified by the Lakera team before official inclusion in the leaderboard
Save results as markdown for reproducibility

Execution Diagram

GitHub URL

Workflow Repository