Principle:Microsoft BIPIA Prompt Construction With Defense

Field	Value
Sources	BIPIA paper
Domains	NLP, Security, Prompt_Engineering
Last Updated	2026-02-14

Overview

A defense-augmented prompt construction methodology that wraps external content with border delimiters and prepends few-shot demonstration examples to test prompts.

Description

This principle extends the standard prompt construction with two defense modifications:

Modification 1: Border Delimiters. The user prompt is processed through add_border(), which inserts configurable delimiter strings around the external content (context). The border type is selected at initialization and determines the visual separator:

"=" inserts lines of equals signs before and after the context.
"-" inserts lines of dashes before and after the context.
"code" inserts triple backtick fences around the context.
"empty" applies no border (baseline behavior).

These delimiters provide a structural cue that signals to the model where untrusted external content begins and ends.

Modification 2: Few-Shot Example Prepending. The final message is constructed by concatenating the system prompt, few-shot example messages, and the bordered user prompt. The example messages demonstrate the desired behavior of answering the original task while ignoring injected attacks. This provides a behavioral cue that complements the structural border cue.

The layered approach ensures the model sees both structural (borders) and behavioral (few-shot examples) defense cues simultaneously.

Usage

Use when constructing defended prompts for test-time evaluation of black-box defense effectiveness. This principle applies specifically to scenarios where the model is accessed through an API and no parameter modification is possible. The defense is configured through two parameters: border_type (which delimiter style to use) and num_examples (how many few-shot demonstrations to include).

Theoretical Basis

The defended prompt structure follows this template:

[system_prompt]
+ [example_user + example_assistant] x N
+ [bordered_user_prompt]

Border delimiters provide visual separation of untrusted content. The hypothesis is that explicit boundaries help the model distinguish between the user's instruction and external data that may contain adversarial injections. Different border types (equals, dashes, code fences) offer varying degrees of visual salience.

Few-shot examples leverage in-context learning to demonstrate the correct behavioral pattern. By observing N examples where attacks are present in the external content but the assistant responds only to the user's original question, the model is primed to replicate this behavior on unseen test inputs.

The defense is purely prompt-level, requiring no model modification, fine-tuning, or access to model weights. This makes it applicable to any black-box LLM that supports multi-turn conversation or text completion APIs.

Composability. The two defense mechanisms are orthogonal and compose naturally. Borders are applied within each example message as well as in the final test prompt, ensuring visual consistency across all messages in the context window.

Related Pages

Implementation:Microsoft_BIPIA_FewShotChatGPT35Defense_Process_Fn

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment