Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Liu00222 Open Prompt Injection Causal Influence Analysis

From Leeroopedia
Knowledge Sources
Domains NLP, Causal_Inference, Language_Modeling
Last Updated 2026-02-14 15:00 GMT

Overview

A technique that measures whether a text segment disrupts the natural continuation of surrounding text by comparing conditional probabilities from a language model with and without the suspected segment.

Description

Causal Influence Analysis determines whether a middle segment of text is a natural continuation of its context or an injection by measuring its disruption effect. A helper language model (GPT-2) computes the average log-probability of a suffix segment conditioned on just the prefix versus conditioned on the prefix plus the suspected injected segment. If including the suspected segment significantly reduces the probability of the suffix (positive influence score), it is likely injected content because it disrupts the natural language flow.

Usage

Use this principle within the binary search localization pipeline to determine the end boundary of an injection region. After binary search finds the injection start, causal influence analysis scans subsequent segments to find where injected content ends and natural data resumes.

Theoretical Basis

The causal influence score is defined as:

CI(injected)=1|suffix|tlogP(wt|prefix)1|suffix|tlogP(wt|prefix+injected)

Where:

  • P(wt|prefix) is the probability of suffix tokens given only clean prefix
  • P(wt|prefix+injected) is the probability given prefix plus suspected injection

A positive CI score indicates the suspected segment disrupts natural continuation, suggesting it is injected content.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment