Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Liu00222 Open Prompt Injection Cosine Similarity Segmentation Threshold

From Leeroopedia
Knowledge Sources
Domains NLP, Security, Optimization
Last Updated 2026-02-14 15:30 GMT

Overview

Text segmentation for injection localization uses a cosine similarity threshold of 0.0 between consecutive word embeddings to determine segment boundaries.

Description

The PromptLocate defense splits input text into segments before performing binary search injection localization. The segmentation uses a two-stage approach: first, spaCy sentence segmentation with a custom regex-based sentence boundary detector; second, within each sentence, consecutive word embeddings are compared using cosine similarity, and segment boundaries are placed where similarity drops below a threshold. The default threshold of 0.0 means that segments are only split when consecutive word embeddings point in opposite directions (negative cosine similarity), preserving most sentence-level coherence.

Usage

Use this heuristic when configuring or debugging PromptLocate segmentation. The threshold is set via the `sep_thres` parameter in the `PromptLocate` constructor (default 0.0). Increasing the threshold creates more, smaller segments (finer granularity but more binary search queries). Decreasing it below 0.0 creates fewer, larger segments (coarser granularity but fewer queries).

The Insight (Rule of Thumb)

  • Action: Set `sep_thres=0.0` (default) in the `PromptLocate` constructor for balanced segmentation.
  • Value: Threshold of 0.0 splits only where consecutive embeddings have negative cosine similarity. This is the default used in the paper.
  • Trade-off: Higher threshold = more segments = more precise localization but more detector queries (higher latency). Lower threshold = fewer segments = coarser localization but fewer queries.
  • Embeddings source: Uses the base model's input embedding layer (`base_model.get_input_embeddings()`), not a separate embedding model.
  • Empty segment handling: After splitting, empty segments are merged with their preceding segment via `merge_empty_segments()`.

Reasoning

From `apps/PromptLocate.py:246-250` (initialization with default threshold):

class PromptLocate:
    def __init__(self, model_config, helper_model_name="gpt2", sep_thres=0.0):
        self.bd = DataSentinelDetector(model_config)
        self.bd.model.tokenizer.pad_token = self.bd.model.tokenizer.eos_token
        self.embedding_layer = self.bd.model.base_model.get_input_embeddings()
        self.sep_thres = sep_thres

From `apps/PromptLocate.py:188-197` (cosine similarity computation and splitting):

inputs = tokenizer(words, return_tensors="pt", padding=True, truncation=True).to("cuda")
input_embeds = embedding_layer(inputs["input_ids"])  # [batch_size, seq_len, hidden_dim]
embeddings = input_embeds.mean(dim=1).detach().cpu().numpy()

cos_sim = np.array([
    np.dot(embeddings[i], embeddings[i + 1]) /
    (np.linalg.norm(embeddings[i]) * np.linalg.norm(embeddings[i + 1]))
    for i in range(len(embeddings) - 1)
])
split_positions = np.where(cos_sim < thres)[0]

The threshold acts as a sensitivity dial: at 0.0, only semantically opposed consecutive words trigger a split, keeping segments relatively large and the binary search efficient.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment