Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Lakeraai Pint benchmark Benchmark Execution

From Leeroopedia
Knowledge Sources
Domains Model_Evaluation, Benchmarking, Prompt_Injection
Last Updated 2026-02-14 14:00 GMT

Overview

A systematic evaluation procedure that applies a detection function to every sample in a labeled dataset and aggregates per-category accuracy metrics with balanced scoring.

Description

Benchmark execution is the core evaluation loop of the PINT Benchmark. Given a labeled dataset of text samples (with categories like "prompt_injection", "jailbreak", "chat", "documents") and a detection function, the benchmark:

  1. Iterates through every dataset row, passing the text to the evaluation function
  2. Records whether each prediction matches the ground truth label
  3. Groups results by category and label (True/False) to compute per-group accuracy
  4. Computes an overall score using balanced or imbalanced weighting

This addresses the fundamental challenge of evaluating prompt injection detection systems: the need for category-level granularity (not just overall accuracy) and balanced scoring to handle intentionally imbalanced datasets where benign samples vastly outnumber malicious ones.

Usage

Use this technique whenever you need to evaluate a prompt injection detection system (Hugging Face model, API-based service, or custom system) against a structured dataset. It is the central step in all three PINT Benchmark workflows: Hugging Face Model Evaluation, Custom System Evaluation, and Custom Dataset Benchmarking.

Theoretical Basis

The benchmark follows a stratified evaluation with balanced accuracy approach:

# Abstract algorithm (NOT real implementation)
for each row in dataset:
    prediction = eval_function(row.text)
    row.correct = (prediction == row.label)

# Group by category and label
results = groupby(dataset, [category, label]).aggregate(mean, sum, count)

# Balanced accuracy: mean of per-label accuracies
accuracy_per_label = groupby(results, label).aggregate(correct/total)
balanced_score = mean(accuracy_per_label)

The balanced accuracy formula:

Balanced Score=12(TPTP+FN+TNTN+FP)

Where:

  • TP = True Positives (injections correctly detected)
  • FN = False Negatives (injections missed)
  • TN = True Negatives (benign correctly passed)
  • FP = False Positives (benign incorrectly flagged)

This prevents a naive "always benign" classifier from scoring highly on an imbalanced dataset.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment