Principle:Liu00222 Open Prompt Injection Combined Attack Injection
| Knowledge Sources | |
|---|---|
| Domains | Prompt_Injection, Adversarial_ML, Security |
| Last Updated | 2026-02-14 15:00 GMT |
Overview
A composite prompt injection attack that combines fake task completion, instruction override, and direct injection into a single adversarial payload to maximize attack success rate.
Description
Combined Attack Injection is the most sophisticated attack strategy in the framework. It chains three attack primitives: (1) Fake Completion — generating a plausible output for the target task to make the model believe its original task is done, (2) Ignore Override — inserting "Ignore previous instructions" to reset the model context, and (3) Direct Injection — appending the attacker's instruction and data. This combination exploits multiple failure modes simultaneously: context boundary confusion, instruction following priority, and attention dilution.
Usage
Use this principle when evaluating the worst-case attack scenario against an LLM application. The combined attack represents the upper bound of attack effectiveness and is the default attack strategy in the framework.
Theoretical Basis
The combined attack constructs an adversarial prompt with the following structure:
Pseudo-code Logic:
# Combined attack prompt structure
attacked_prompt = (
clean_data # Original user data
+ "\n" + fake_completion_text # Fake output for target task
+ "\nIgnore previous instructions. " # Context reset
+ inject_instruction # Attacker's task instruction
+ " " + inject_data # Attacker's data
)
The fake completion text is task-specific (e.g., "positive" for sentiment analysis, "not spam" for spam detection) to match what the model would naturally output for the target task.