Workflow:Lakeraai Pint benchmark Hugging Face Model Evaluation
| Knowledge Sources | |
|---|---|
| Domains | AI_Security, Benchmarking, Prompt_Injection_Detection |
| Last Updated | 2026-02-14 14:00 GMT |
Overview
End-to-end process for benchmarking a Hugging Face prompt injection detection model against the PINT dataset using the built-in HuggingFaceModelEvaluation utility class.
Description
This workflow covers the standard procedure for evaluating an open-source Hugging Face model's ability to detect prompt injections. It uses the HuggingFaceModelEvaluation utility class provided by the PINT Benchmark to wrap any compatible Hugging Face text-classification model. The utility handles tokenization, input chunking with overlap stride (25% of max context length), and classification aggregation so that long inputs exceeding the model's context window are still properly evaluated. The benchmark then computes per-category accuracy and a balanced overall PINT score.
The process covers dependency installation, model instantiation with the correct injection label and max length, running the benchmark notebook, and interpreting the per-category results table.
Usage
Execute this workflow when you want to evaluate a Hugging Face text-classification model (standard or SetFit) for prompt injection detection performance. Typical triggers include comparing a new open-source model against the PINT leaderboard, validating a fine-tuned model before deployment, or reproducing published PINT scores for existing models such as ProtectAI DeBERTa, Deepset DeBERTa, FMOps DistilBERT, or Meta Llama Prompt Guard.
Execution Steps
Step 1: Environment Setup
Install the required Python dependencies for running the PINT Benchmark and the Hugging Face model. This includes the transformers library, torch, and the PINT benchmark's own dependencies declared in pyproject.toml. For SetFit models, the setfit package must also be installed. Use Poetry for environment management as specified by the project.
Key considerations:
- Use Poetry to manage the Python environment (poetry install)
- Install additional model-specific dependencies via pip in the notebook
- Ensure GPU drivers are available if CUDA acceleration is desired
Step 2: Model Instantiation
Create an instance of the HuggingFaceModelEvaluation class with the target model's configuration. This requires specifying the Hugging Face model identifier, the label string the model uses for injection detection, and optionally the maximum input length. The constructor automatically loads the model weights, initializes the tokenizer, and creates a text-classification pipeline (or SetFit predictor).
Key considerations:
- The injection_label must exactly match the model's output label for injections (e.g., "INJECTION", "LABEL_1")
- The max_length defaults to the model's max_position_embeddings config value if not specified
- For SetFit models, pass is_setfit=True and optionally specify tokenizer_model if it differs from the model name
Step 3: Benchmark Execution
Pass the model's evaluate method as the eval_function argument to the pint_benchmark() function in the Jupyter Notebook. Also provide the model_name for labeling the results. The benchmark iterates over all 4,314 inputs in the PINT dataset, calling the evaluation function for each input and collecting predictions.
Key considerations:
- The evaluation function signature is text: str -> bool (True = injection detected)
- Long inputs are automatically chunked by the utility with 25% overlap stride
- A single positive chunk triggers a positive classification for the entire input
Step 4: Results Interpretation
The benchmark outputs a results table showing per-category accuracy (prompt_injection, jailbreak, hard_negatives, chat, documents) and an overall balanced PINT score. Review the per-category breakdown to identify the model's strengths and weaknesses across different input types and languages.
Key considerations:
- The balanced score weights categories equally regardless of sample count
- Hard negatives test false positive resistance (benign inputs that look like injections)
- Compare results against the published PINT leaderboard for context
- Results can be saved as markdown files in the results/ directory