Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Lakeraai Pint benchmark Custom System Evaluation

From Leeroopedia
Revision as of 11:04, 16 February 2026 by Admin (talk | contribs) (Auto-imported from workflows/Lakeraai_Pint_benchmark_Custom_System_Evaluation.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains AI_Security, Benchmarking, Prompt_Injection_Detection
Last Updated 2026-02-14 14:00 GMT

Overview

End-to-end process for benchmarking a custom (non-Hugging Face) prompt injection detection system against the PINT dataset by writing a custom evaluation function callback.

Description

This workflow covers the procedure for evaluating any prompt injection detection system that is not a standard Hugging Face model. This includes commercial API services (such as Lakera Guard, AWS Bedrock Guardrails, Azure AI Prompt Shield, Google Model Armor), third-party libraries (such as WhyLabs LangKit), or custom in-house detection logic. The user writes a thin adapter function that accepts a single text string and returns a boolean indicating whether an injection was detected. This function is then passed to pint_benchmark() as a callback, and the benchmark handles dataset iteration, scoring, and result generation.

Usage

Execute this workflow when you want to benchmark a prompt injection detection system that does not use the standard Hugging Face text-classification pipeline. This includes API-based commercial services, custom rule-based detectors, ensemble systems, or any detection tool that exposes a programmatic interface returning injection verdicts. The custom eval function pattern is the most flexible integration path into the PINT Benchmark.

Execution Steps

Step 1: Environment Setup

Install the PINT Benchmark dependencies via Poetry and any additional libraries required by the target detection system. For API-based services, configure authentication credentials (API keys, endpoints). For library-based tools, install the relevant package in the benchmark notebook.

Key considerations:

  • Use Poetry for the base environment (poetry install)
  • Install detection system dependencies via pip in the notebook
  • Store API keys securely; do not commit credentials to the repository
  • Verify network connectivity for cloud-based detection APIs

Step 2: Define Evaluation Function

Write a custom Python function that conforms to the PINT Benchmark's callback interface. The function must accept a single str parameter (the input text to classify) and return a bool (True if prompt injection is detected, False otherwise). The function encapsulates all interaction with the detection system, including API calls, response parsing, and threshold application.

Key considerations:

  • Function signature must be eval_function(text: str) -> bool
  • Handle any necessary response parsing (e.g., confidence thresholds, label mapping)
  • Consider rate limiting for API-based services
  • The function is called once per dataset input (4,314 times for the full PINT dataset)

Pseudocode:

Define function accepting text string
Call detection system with text
Parse response to extract injection verdict
Apply threshold or label mapping
Return boolean result

Step 3: Benchmark Execution

Pass the custom evaluation function to pint_benchmark() along with a descriptive model_name string. The benchmark loads the PINT dataset, iterates over all inputs, calls the evaluation function for each, collects predictions, and computes per-category accuracy and the balanced PINT score.

Key considerations:

  • Provide a descriptive model_name for clear result labeling
  • For slow APIs, benchmark execution may require extended runtime
  • The benchmark uses the default PINT dataset unless a custom dataset or DataFrame is provided
  • Error handling in the eval function prevents individual failures from halting the full run

Step 4: Results Analysis

Review the benchmark output table showing per-category accuracy breakdown and the overall balanced PINT score. Compare results against the published leaderboard. Analyze category-specific performance to understand the system's strengths (e.g., jailbreak detection) and weaknesses (e.g., false positive rate on hard negatives).

Key considerations:

  • The balanced score weights all categories equally
  • High hard_negatives accuracy indicates low false positive rate
  • Results should be verified by the Lakera team before official inclusion in the leaderboard
  • Save results as markdown for reproducibility

Execution Diagram

GitHub URL

Workflow Repository