Heuristic:Lakeraai Pint benchmark Chunking Stride 25 Percent Overlap

Knowledge Sources	PINT Benchmark Lakera AI Internal
Domains	NLP, Optimization, Prompt_Injection
Last Updated	2026-02-14 15:00 GMT

Overview

Tokenization overlap strategy using a 25% stride when chunking long inputs to prevent missed prompt injection detections at chunk boundaries.

Description

When evaluating prompts that exceed a model's maximum token length, the input must be split into multiple chunks. Naive chunking (no overlap) risks splitting a prompt injection payload across two chunks, causing both chunks to appear benign individually. The PINT Benchmark addresses this by using a sliding window with a 25% overlap stride: each successive chunk starts 75% of max_length tokens ahead of the previous one, so 25% of each chunk's content overlaps with the next.

This strategy is combined with an any-positive aggregation: if any chunk is classified as injection, the entire input is flagged as positive. Together, the overlap and aggregation ensure that injections embedded deep within long documents or positioned near chunk boundaries are reliably detected.

Usage

Apply this heuristic when evaluating prompt injection detection models on inputs that may exceed the model's context window (typically 512 tokens). It is built into the HuggingFaceModelEvaluation._chunk_input() method and applies automatically to all models evaluated through the PINT Benchmark's HuggingFace evaluation utility.

The Insight (Rule of Thumb)

Action: Set the stride parameter to 25% of max_length when calling the tokenizer with return_overflowing_tokens=True.
Value: stride = math.floor(max_length / 4). For a typical max_length=512, the stride is 128 tokens.
Trade-off: Increases the number of chunks (and therefore inference calls) compared to non-overlapping chunking. For a 2048-token input with max_length=512 and stride=128: non-overlapping produces 4 chunks, while 25% overlap produces approximately 5 chunks (~25% more inference).
Aggregation: Use any() over chunk predictions — if any chunk is flagged as injection, the entire input is positive.

Reasoning

Prompt injection payloads embedded in long documents (e.g., a system prompt followed by thousands of tokens of context, with an injection near the end) may land exactly at the boundary between two chunks. Without overlap, the payload could be split in half across chunks, with neither half containing enough signal for the classifier to detect it.

The 25% overlap value is a practical balance:

Too little overlap (e.g., 5-10%): Risk of still missing boundary injections, especially for payloads spanning many tokens.
Too much overlap (e.g., 50%): Doubles the number of chunks, significantly increasing inference time without proportional detection improvement.
25% overlap: Provides a safety margin of 128 tokens (at max_length=512) while keeping the additional compute cost moderate.

The comment in the source code explicitly states the rationale: "add overlap of 25% of the max length to make sure we aren't missing prompt injections at the edge of the model's context length when chunking."

Code Evidence

Chunking stride logic from benchmark/utils/evaluate_hugging_face_model.py:77-88:

tokenized = self.tokenizer(
    prompt,
    max_length=self.max_length,
    truncation=True,
    return_overflowing_tokens=True,
    padding=True,
    return_tensors="pt",
    add_special_tokens=False,
    # add overlap of 25% of the max length to make sure we aren't
    # missing prompt injections at the edge of the model's context
    # length when chunking
    stride=math.floor(self.max_length / 4),
)

Any-positive aggregation from benchmark/utils/evaluate_hugging_face_model.py:135-148:

def evaluate(self, prompt: str) -> bool:
    tokenized_chunks = self._chunk_input(prompt=prompt)
    evaluations = self._evaluate_chunks(tokenized_chunks=tokenized_chunks)

    if self.is_setfit:
        return any(evaluation == 1 for evaluation in evaluations)
    else:
        return any(
            [
                evaluation["label"] == self.injection_label
                for evaluation in evaluations
            ]
        )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment