Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Microsoft BIPIA FewShotChatGPT35Defense

From Leeroopedia
Field Value
Sources Microsoft BIPIA Repository
Domains NLP, Security, Defense
Last Updated 2026-02-14

Overview

Concrete tool for applying border string and few-shot in-context learning defenses to GPT-3.5 API models provided by the BIPIA defense module.

Description

FewShotChatGPT35Defense extends the GPT35WOSystem base class to add defense mechanisms against indirect prompt injection attacks. The class is responsible for:

  • Maintaining train/test dataset splits -- It accepts a DatasetDict containing both training and testing splits. The training split serves as the pool from which few-shot defense examples are drawn.
  • Randomly selecting few-shot examples -- At initialization, the class samples num_examples entries from the training set (optionally seeded for reproducibility) to serve as in-context demonstrations.
  • Wrapping external content with configurable border strings -- The add_border() method inserts border delimiters around untrusted external content within the prompt, using the configured border type (empty, equals, dashes, or code fences).
  • Constructing few-shot example messages -- The construct_example() method builds the sequence of example messages that demonstrate correct model behavior (ignoring injected attacks) and caches them for reuse across test prompts.
  • Overriding process_fn() -- The overridden method prepends the constructed few-shot defense context to each test prompt before inference, ensuring every query sent to the API includes the defense demonstrations and bordered content.

The class inherits the GPT model's generate() method for actual inference, so the defense is purely a prompt-construction layer that does not modify the underlying API call mechanics.

Usage

Instantiate FewShotChatGPT35Defense with a GPT config path, an accelerator, a DatasetDict containing train/test splits, the desired number of few-shot examples, and a border type. Then call construct_example() to build the few-shot context from the training set. Finally, use process_fn() to augment individual test examples with the defense context and generate() to run defended inference.

Code Reference

Property Value
Source BIPIA repository
File defense/black_box/few_shot.py
Lines L160-280

Signatures

FewShotChatGPT35Defense.__init__(
    self,
    config,
    accelerator,
    dataset,
    num_examples: int = 1,
    seed: int = None,
    border_type: str = ""
)
FewShotChatGPT35Defense.add_border(
    self,
    user_prompt: str,
    context: str
) -> str
FewShotChatGPT35Defense.construct_example(
    self,
    prompt_construct_fn: Callable,
    response_construct_fn: Callable
) -> None
FewShotChatGPT35Defense.process_fn(
    self,
    example: dict,
    prompt_construct_fn: Callable
) -> dict

Import

from defense.black_box.few_shot import FewShotChatGPT35Defense

Alternatively, the class is instantiated within the defense script and does not need to be imported directly by end users.

I/O Contract

Inputs

Parameter Type Required Description
config str Yes Path to a GPT configuration YAML file.
accelerator Accelerator Yes HuggingFace Accelerate instance for distributed setup.
dataset DatasetDict Yes Dataset with train and test splits.
num_examples int No (default: 1) Number of few-shot examples to sample from the training set.
seed int No Random seed for reproducible example selection.
border_type str No (default: "empty") Border delimiter type. One of "empty", "=", "-", or "code".

Outputs

Initialization produces a fully configured defense wrapper. Calling process_fn() returns a modified example dict whose message field has been augmented with few-shot defense context and bordered external content, ready for submission to the GPT-3.5 API via generate().

Usage Examples

from datasets import DatasetDict, Dataset
from defense.black_box.few_shot import FewShotChatGPT35Defense

# Assume train_data and test_data are lists of dicts with prompt/context/response keys
dataset = DatasetDict({
    "train": Dataset.from_list(train_data),
    "test": Dataset.from_list(test_data),
})

# Initialize the defense with 3 few-shot examples and equals-sign borders
defense = FewShotChatGPT35Defense(
    config="configs/gpt35.yaml",
    accelerator=accelerator,
    dataset=dataset,
    num_examples=3,
    seed=42,
    border_type="="
)

# Build the few-shot defense context from the training split
defense.construct_example(
    prompt_construct_fn=my_prompt_builder,
    response_construct_fn=my_response_builder,
)

# Apply the defense to every example in the test split
defended_test = dataset["test"].map(
    lambda example: defense.process_fn(example, prompt_construct_fn=my_prompt_builder)
)

# Run defended inference
results = defense.generate(defended_test)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment