Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Liu00222 Open Prompt Injection Defense Strategy Selection

From Leeroopedia
Knowledge Sources
Domains Security, NLP, Optimization
Last Updated 2026-02-14 15:30 GMT

Overview

Decision framework for selecting among 12+ prompt injection defense strategies based on resource availability, task type, and acceptable overhead.

Description

The Open-Prompt-Injection toolkit implements a wide range of defense strategies against prompt injection attacks. These defenses fall into several categories: prompt engineering defenses (sandwich, instructional, delimiters, XML, random_seq), pre-processing defenses (paraphrasing, retokenization), detection-based defenses (PPL, known-answer, LLM-based), response filtering (response-based), and specialized model-based defenses (DataSentinel, PromptLocate). Each defense has different resource requirements, compatibility constraints, and effectiveness trade-offs.

Usage

Use this heuristic when choosing which defense strategy to deploy in an LLM-integrated application. The choice depends on: whether you have GPU resources, whether the task is classification or generation, how much latency overhead is acceptable, and whether you need to detect or localize injections.

The Insight (Rule of Thumb)

  • Zero-cost defenses (no extra model calls): `sandwich`, `instructional`, `delimiters`, `xml`, `random_seq` — these modify the prompt template only. Use as a baseline.
  • One extra model call: `paraphrasing` (preprocesses user data), `llm-based` (pre-query detection), `known-answer` (echo test with secret token "DGDSGNH").
  • Requires surrogate GPU model: `ppl` (perplexity-based detection) loads Vicuna-7B-v1.3 as a surrogate, needing CUDA + ~9GiB VRAM.
  • Requires fine-tuned model: DataSentinel and PromptLocate require a fine-tuned QLoRA checkpoint on Mistral-7B.
  • Response-based filtering: Only works for classification tasks (sst2, mrpc, rte, sms_spam, hsol). Not applicable to generation tasks (gigaword, jfleg).
  • Retokenization: Requires the BPE merge table file at `./data/subword_nmt.voc`. Uses dropout rate 0.1 with up to 10 retries.
  • Trade-off: Prompt engineering defenses add zero latency but limited effectiveness. Model-based defenses (PPL, DataSentinel) are more robust but require GPU and add inference latency.

Reasoning

The defense strategy selection is encoded in the `Application.__defense_preparation()` method (`apps/Application.py:59-100`) which branches on the defense string. The response-based filter explicitly excludes generative tasks:

From `apps/Application.py:60`:

if self.defense == 'response-based' and self.task.dataset not in ['gigaword', 'jfleg']:
    self.response_based_filter = {
        'sst2':eval_sst2,
        'mrpc':eval_mrpc,
        'rte':eval_rte,
        'sms_spam':eval_spam,
        'hsol':eval_hsol
    }[self.task.dataset]

The PPL defense requires loading a full 7B model as surrogate:

From `apps/Application.py:84-93`:

self.surrogate_backbone, self.surrogate_tokenizer = load_model(
    'lmsys/vicuna-7b-v1.3',
    "cuda",
    8,
    "9GiB",
    ...
)

The known-answer defense uses a hardcoded secret token with a TODO noting it should be dynamically generated:

From `apps/Application.py:148-149`:

# TODO: replace hard-coded secret data with one generated on-the-fly
secret_data = "DGDSGNH"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment