Principle:Lakeraai Pint benchmark HF Model Wrapping
| Knowledge Sources | |
|---|---|
| Domains | NLP, Model_Evaluation, Prompt_Injection |
| Last Updated | 2026-02-14 14:00 GMT |
Overview
A design pattern that wraps heterogeneous Hugging Face model architectures behind a unified evaluation interface for prompt injection detection.
Description
When benchmarking prompt injection detection, models from Hugging Face Hub come in diverse architectures (standard sequence classification, SetFit few-shot models) with different tokenization requirements, context length limits, and output formats. HF Model Wrapping solves this fragmentation by encapsulating the model loading, tokenizer initialization, and classification pipeline setup into a single constructor. This allows downstream benchmark code to call a uniform evaluate(prompt) -> bool method regardless of the underlying model type.
The pattern handles three key challenges:
- Architecture branching: Standard HuggingFace text-classification pipelines vs. SetFit predictors require different loading and inference paths.
- Context length management: Models have varying maximum token lengths. The wrapper auto-detects
max_position_embeddingsfrom model config or falls back to a default of 512 tokens. - Long input chunking: Prompts exceeding the model's context window are split into overlapping chunks (25% stride) to ensure prompt injections near chunk boundaries are not missed.
Usage
Use this pattern when you need to evaluate any Hugging Face text-classification or SetFit model against a benchmark dataset. It is the entry point for the Hugging Face Model Evaluation workflow in the PINT Benchmark and should be instantiated once per model before passing its evaluate method to the benchmark runner.
Theoretical Basis
The wrapping pattern follows the Adapter design pattern from object-oriented design:
# Abstract algorithm (NOT real implementation)
wrapper = ModelWrapper(model_id, config_params)
# Internally:
# 1. Load model (HF pipeline OR SetFit)
# 2. Load tokenizer
# 3. Determine max_length from config or default
# 4. Create classifier (pipeline or predict method)
result = wrapper.evaluate(prompt)
# Internally:
# 1. Tokenize with chunking + stride overlap
# 2. Classify each chunk
# 3. Return True if ANY chunk flagged as injection
The chunking strategy uses a 25% overlap stride to prevent false negatives at chunk boundaries:
Failed to parse (syntax error): {\displaystyle \text{stride} = \lfloor \frac{\text{max\_length}}{4} \rfloor }
This ensures that a prompt injection payload spanning two chunks will be fully contained within at least one chunk.