Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Lakeraai Pint benchmark Custom Eval Function Interface

From Leeroopedia
Knowledge Sources
Domains Software_Design, Prompt_Injection, Model_Evaluation
Last Updated 2026-02-14 14:00 GMT

Overview

A callback interface pattern that decouples the benchmark execution loop from the specific prompt injection detection system being evaluated.

Description

The Custom Eval Function Interface defines a contract: any prompt injection detection system can be benchmarked by implementing a single function with the signature eval_function(prompt: str) -> bool. This function accepts a text string and returns True if prompt injection is detected, False otherwise.

This pattern solves the problem of heterogeneous detection system APIs. Prompt injection detectors vary widely:

  • API-based services (Lakera Guard, AWS Bedrock Guardrails, Azure AI Content Safety) require HTTP calls with different authentication and response formats.
  • Local models (HuggingFace transformers, SetFit) require model loading and inference.
  • SDK-based tools (WhyLabs LangKit) have their own Python APIs with unique output schemas.

By standardizing the interface to a single (str) -> bool function, the benchmark runner can evaluate any system without modification.

Usage

Use this pattern when integrating a new prompt injection detection system into the PINT Benchmark. You must write a wrapper function that calls your system's API or SDK and returns a boolean. This function is then passed as the eval_function parameter to pint_benchmark().

Theoretical Basis

The pattern follows the Strategy design pattern (also known as the Callback pattern):

# Abstract interface specification (NOT real implementation)
def eval_function(prompt: str) -> bool:
    """
    Contract:
    - Input: A single text string (the prompt to evaluate)
    - Output: True if prompt injection detected, False otherwise
    - Side effects: None required (may make API calls internally)
    - Error handling: Should raise exceptions on failure (not return None)
    """
    ...

The interface is deliberately minimal: a single string input and a single boolean output. This simplicity ensures that:

  1. Any system can be adapted with minimal wrapper code
  2. The benchmark loop does not need to know about authentication, response parsing, or model internals
  3. New detection systems can be added without modifying the benchmark code

Practical Guide

Step 1: Identify Your System's Detection Method

Determine how your system classifies prompts:

  • API endpoint returning a JSON response with a flag field
  • SDK function returning a score or label
  • Local model returning predictions

Step 2: Write the Wrapper Function

Create a function that:

  1. Calls your system with the prompt string
  2. Parses the response to extract the injection detection result
  3. Returns True for injection, False for benign

Step 3: Handle Thresholds

If your system returns a score rather than a boolean, apply a threshold:

  • Common threshold: > 0.5 for injection
  • Adjust based on your system's calibration

Step 4: Pass to pint_benchmark

pint_benchmark(
    df=df,
    eval_function=your_eval_function,
    model_name="Your System Name",
)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment