Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Liu00222 Open Prompt Injection Prompt Localization

From Leeroopedia
Revision as of 17:31, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Liu00222_Open_Prompt_Injection_Prompt_Localization.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Prompt_Injection, Security, Defense
Last Updated 2026-02-14 15:00 GMT

Overview

A defense pipeline that localizes the exact boundaries of injected content within a contaminated prompt and recovers the original clean data by excising the identified injection regions.

Description

Prompt Localization (PromptLocate) goes beyond binary injection detection to answer where within the text the injection occurs. It combines three techniques: (1) Text segmentation using embedding similarity and spaCy sentence splitting, (2) Binary search with DataSentinel detection to narrow down injection boundaries to individual segments, and (3) Causal influence analysis using a helper LM (GPT-2) to identify the exact end of injected data. Once injection regions are identified, they are excised and the remaining clean text is returned for safe processing.

Usage

Use this principle as a second-stage defense after DataSentinel detects contamination. It provides fine-grained localization that enables data recovery rather than simple rejection, allowing the application to still process the clean portions of the input.

Theoretical Basis

The pipeline operates in three phases:

Pseudo-code Logic:

# Phase 1: Segment the text
segments = split_by_sentences_and_embeddings(text)

# Phase 2: Binary search for injection boundaries
for each detected injection region:
    start = binary_search(segments, detector, "find injection start")
    end = find_data_end(segments, start, causal_influence_model)

# Phase 3: Recover clean data
clean_text = remove_segments(segments, injection_regions)
injected_text = extract_segments(segments, injection_regions)
return (clean_text, injected_text)

The binary search leverages DataSentinel: concatenating segments and querying the detector. If a half-range is detected as contaminated, the injection start lies within it. Causal influence uses GPT-2 to determine where natural data flow is disrupted.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment