Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lakeraai Pint benchmark Eval Function Callback

From Leeroopedia
Knowledge Sources
Domains Software_Design, Prompt_Injection, Model_Evaluation
Last Updated 2026-02-14 14:00 GMT

Overview

Interface specification for user-defined evaluation functions that integrate custom prompt injection detection systems with the PINT Benchmark.

Description

This is a Pattern Doc — it documents a user-defined interface, not a library API. The eval_function callback is the integration point between the benchmark runner and any prompt injection detection system. Users implement this function to wrap their specific system's API, SDK, or model inference behind a standardized (str) -> bool signature.

The repository provides two reference implementations:

  • evaluate_lakera_guard (cell-11 in the notebook) — wraps the Lakera Guard REST API
  • evaluate_langkit (in examples/whylabs/langkit.md) — wraps WhyLabs LangKit's injection scoring

Usage

Implement this interface when benchmarking any non-HuggingFace detection system, or when the built-in HuggingFaceModelEvaluation wrapper does not support your model's architecture.

Code Reference

Source Location

  • Repository: pint-benchmark
  • File: benchmark/pint-benchmark.ipynb (cell-11, reference implementation: evaluate_lakera_guard)
  • File: examples/whylabs/langkit.md (lines 26-34, example: evaluate_langkit)
  • File: README.md (lines 129-134, interface documentation)

Signature

def eval_function(prompt: str) -> bool:
    """
    Evaluate a single prompt for prompt injection.

    This is the interface contract that all evaluation functions must follow.
    The function name can be anything — it is passed as a callback parameter.

    Args:
        prompt: The text input to evaluate for prompt injection.

    Returns:
        True if prompt injection is detected, False otherwise.
    """
    ...

Import

# No import needed — users define their own function
# The function is passed as a callback to pint_benchmark()

I/O Contract

Inputs

Name Type Required Description
prompt str Yes A single text string to evaluate for prompt injection

Outputs

Name Type Description
return value bool True if prompt injection detected, False otherwise

Usage Examples

API-Based System (Lakera Guard Reference)

import os
import requests

lakera_session = requests.Session()
lakera_session.headers.update(
    {"Authorization": f'Bearer {os.environ.get("LAKERA_GUARD_API_KEY")}'}
)

def evaluate_lakera_guard(prompt: str) -> bool:
    """Reference implementation: Lakera Guard API."""
    response = lakera_session.post(
        "https://api.lakera.ai/v1/prompt_injection",
        json={"input": prompt},
    )
    response.raise_for_status()
    return response.json()["results"][0]["flagged"]

# Use with benchmark
pint_benchmark(
    df=df,
    eval_function=evaluate_lakera_guard,
    model_name="Lakera Guard",
)

SDK-Based System (WhyLabs LangKit)

from langkit import injections, extract

schema = injections.init()

def evaluate_langkit(prompt: str) -> bool:
    """WhyLabs LangKit injection scoring with 0.5 threshold."""
    result = extract({"prompt": prompt}, schema=schema)
    return result["prompt.injection"] > 0.5

pint_benchmark(
    df=df,
    eval_function=evaluate_langkit,
    model_name="WhyLabs LangKit",
)

Custom Score-Based System

import requests

def evaluate_my_system(prompt: str) -> bool:
    """Custom system with configurable threshold."""
    response = requests.post(
        "https://my-detection-api.example.com/analyze",
        json={"text": prompt},
        headers={"Authorization": "Bearer MY_API_KEY"},
    )
    response.raise_for_status()
    score = response.json()["injection_score"]
    return score > 0.5  # Apply threshold

pint_benchmark(
    df=df,
    eval_function=evaluate_my_system,
    model_name="My Custom System",
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment