Implementation:Microsoft BIPIA AutoPIABuilder

Field	Value
Sources	Repository: Microsoft BIPIA
Domains	NLP, Security, Data_Engineering
Last Updated	2026-02-14

Overview

Concrete tool for constructing prompt injection attack datasets provided by the BIPIA benchmark library.

Description

AutoPIABuilder is a factory class that maps task names to specialized builder classes within the BIPIA framework. It exposes a single class method, from_name(), which resolves a task name string to the appropriate builder class. The supported mappings are:

Task Name	Builder Class
`"code"`	`CodeIPIABuilder`
`"qa"`	`QAIPIADataset`
`"table"`	`TableIPIABuilder`
`"email"`	`EmailIPIABuilder`
`"abstract"`	`AbstractIPIADataset`

Each builder inherits from BasePIABuilder and implements the logic for constructing poisoned datasets by combining clean task-specific contexts with adversarial attack strings at configurable insertion positions (start, middle, end). The factory pattern allows callers to work with a uniform interface regardless of the underlying task type, while each specialized builder handles task-specific context parsing and metadata preservation.

Usage

Import AutoPIABuilder when you need to create a prompt injection attack dataset for any of the 5 BIPIA benchmark tasks. The builder handles context loading, attack loading, cross-product generation, and DataFrame assembly in a single callable pipeline.

Code Reference

Source Location: Repository: BIPIA, File: bipia/data/__init__.py, Lines: L10-31

Signature:

class AutoPIABuilder:
    @classmethod
    def from_name(cls, name: str) -> BasePIABuilder
        """Resolve a task name to its specialized builder class."""

class BasePIABuilder:
    def __call__(
        self,
        contexts: str,
        attacks: str,
        insert_fns=[insert_end, insert_start, insert_middle],
        insert_fn_names=["end", "start", "middle"],
        enable_stealth: bool = False
    ) -> pd.DataFrame

Import:

from bipia.data import AutoPIABuilder

I/O Contract

Inputs:

Parameter	Type	Required	Description
`name`	str	Yes	Task name: `"code"`, `"qa"`, `"table"`, `"email"`, or `"abstract"`
`contexts`	str	Yes	Path to JSONL file containing clean context data
`attacks`	str	Yes	Path to JSON file containing attack definitions
`insert_fns`	list	No	List of insertion functions (defaults to `[insert_end, insert_start, insert_middle]`)
`enable_stealth`	bool	No	Whether to base64-encode attack strings before insertion (defaults to `False`)

Outputs:

pd.DataFrame with the following columns:

Column	Description
`context`	The poisoned context (original context with injected attack string)
`attack_name`	Identifier for the attack type used
`attack_str`	The raw attack string that was injected
`task_name`	The task type (e.g., `"qa"`, `"code"`)
`ideal`	The ideal (correct) answer for the original clean task
`question`	The associated question (present for QA tasks)
`position`	Insertion position: `"end"`, `"start"`, or `"middle"`

Usage Examples

Basic usage -- build a QA prompt injection dataset:

from bipia.data import AutoPIABuilder

builder = AutoPIABuilder.from_name("qa")
seeded_builder = builder(seed=42)
df = seeded_builder("path/to/contexts.jsonl", "path/to/attacks.json")

With stealth mode enabled (base64-encoded attacks):

from bipia.data import AutoPIABuilder

builder = AutoPIABuilder.from_name("email")
seeded_builder = builder(seed=42)
df = seeded_builder("path/to/contexts.jsonl", "path/to/attacks.json", enable_stealth=True)

Related Pages

Principle:Microsoft_BIPIA_Dataset_Preparation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment