Principle:Liu00222 Open Prompt Injection Conditional Probability Computation
| Knowledge Sources | |
|---|---|
| Domains | NLP, Language_Modeling, Probability |
| Last Updated | 2026-02-14 15:00 GMT |
Overview
A technique for computing the conditional log-probability of a target text sequence given a conditioning prefix using an autoregressive language model.
Description
Conditional Probability Computation extracts the log-probability that a language model assigns to generating a target text sequence given a conditioning prefix. This is the fundamental building block of causal influence analysis: by comparing the conditional probability of a suffix given different prefixes, we can determine whether an intervening segment is natural continuation or injection. The computation uses teacher forcing (feeding the actual tokens and extracting their predicted probabilities) rather than free generation.
Usage
Use this principle as a utility within causal influence analysis. It is called twice for each influence score computation: once with just the clean prefix and once with the prefix plus the suspected injection, both conditioning on the same suffix.
Theoretical Basis
For an autoregressive model with vocabulary V:
The average log-probability normalizes by sequence length:
In practice, the function concatenates condition and target, runs a forward pass, and extracts log-probabilities only for the target token positions using `log_softmax` over the logits.