Heuristic:Liu00222 Open Prompt Injection Cosine Similarity Segmentation Threshold

Knowledge Sources	Open-Prompt-Injection PromptLocate: Localizing Prompt Injection Attacks
Domains	NLP, Security, Optimization
Last Updated	2026-02-14 15:30 GMT

Overview

Text segmentation for injection localization uses a cosine similarity threshold of 0.0 between consecutive word embeddings to determine segment boundaries.

Description

The PromptLocate defense splits input text into segments before performing binary search injection localization. The segmentation uses a two-stage approach: first, spaCy sentence segmentation with a custom regex-based sentence boundary detector; second, within each sentence, consecutive word embeddings are compared using cosine similarity, and segment boundaries are placed where similarity drops below a threshold. The default threshold of 0.0 means that segments are only split when consecutive word embeddings point in opposite directions (negative cosine similarity), preserving most sentence-level coherence.

Usage

Use this heuristic when configuring or debugging PromptLocate segmentation. The threshold is set via the `sep_thres` parameter in the `PromptLocate` constructor (default 0.0). Increasing the threshold creates more, smaller segments (finer granularity but more binary search queries). Decreasing it below 0.0 creates fewer, larger segments (coarser granularity but fewer queries).

The Insight (Rule of Thumb)

Action: Set `sep_thres=0.0` (default) in the `PromptLocate` constructor for balanced segmentation.
Value: Threshold of 0.0 splits only where consecutive embeddings have negative cosine similarity. This is the default used in the paper.
Trade-off: Higher threshold = more segments = more precise localization but more detector queries (higher latency). Lower threshold = fewer segments = coarser localization but fewer queries.
Embeddings source: Uses the base model's input embedding layer (`base_model.get_input_embeddings()`), not a separate embedding model.
Empty segment handling: After splitting, empty segments are merged with their preceding segment via `merge_empty_segments()`.

Reasoning

From `apps/PromptLocate.py:246-250` (initialization with default threshold):

class PromptLocate:
    def __init__(self, model_config, helper_model_name="gpt2", sep_thres=0.0):
        self.bd = DataSentinelDetector(model_config)
        self.bd.model.tokenizer.pad_token = self.bd.model.tokenizer.eos_token
        self.embedding_layer = self.bd.model.base_model.get_input_embeddings()
        self.sep_thres = sep_thres

From `apps/PromptLocate.py:188-197` (cosine similarity computation and splitting):

inputs = tokenizer(words, return_tensors="pt", padding=True, truncation=True).to("cuda")
input_embeds = embedding_layer(inputs["input_ids"])  # [batch_size, seq_len, hidden_dim]
embeddings = input_embeds.mean(dim=1).detach().cpu().numpy()

cos_sim = np.array([
    np.dot(embeddings[i], embeddings[i + 1]) /
    (np.linalg.norm(embeddings[i]) * np.linalg.norm(embeddings[i + 1]))
    for i in range(len(embeddings) - 1)
])
split_positions = np.where(cos_sim < thres)[0]

The threshold acts as a sensitivity dial: at 0.0, only semantically opposed consecutive words trigger a split, keeping segments relatively large and the binary search efficient.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment