Principle:Liu00222 Open Prompt Injection Conditional Probability Computation

Knowledge Sources	Open-Prompt-Injection
Domains	NLP, Language_Modeling, Probability
Last Updated	2026-02-14 15:00 GMT

Overview

A technique for computing the conditional log-probability of a target text sequence given a conditioning prefix using an autoregressive language model.

Description

Conditional Probability Computation extracts the log-probability that a language model assigns to generating a target text sequence given a conditioning prefix. This is the fundamental building block of causal influence analysis: by comparing the conditional probability of a suffix given different prefixes, we can determine whether an intervening segment is natural continuation or injection. The computation uses teacher forcing (feeding the actual tokens and extracting their predicted probabilities) rather than free generation.

Usage

Use this principle as a utility within causal influence analysis. It is called twice for each influence score computation: once with just the clean prefix and once with the prefix plus the suspected injection, both conditioning on the same suffix.

Theoretical Basis

For an autoregressive model with vocabulary V:

$\log P (t a r g e t | c o n d i t i o n) = \sum_{t = 1}^{T} \log P (w_{t} | w_{< t}, c o n d i t i o n)$

The average log-probability normalizes by sequence length:

$\overline{\log P} = \frac{1}{T} \sum_{t = 1}^{T} \log P (w_{t} | w_{< t}, c o n d i t i o n)$

In practice, the function concatenates condition and target, runs a forward pass, and extracts log-probabilities only for the target token positions using `log_softmax` over the logits.

Related Pages

Implemented By

Implementation:Liu00222_Open_Prompt_Injection_compute_conditional_probability

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment