Implementation:Microsoft BIPIA AutoPIABuilder
| Field | Value |
|---|---|
| Sources | Repository: Microsoft BIPIA |
| Domains | NLP, Security, Data_Engineering |
| Last Updated | 2026-02-14 |
Overview
Concrete tool for constructing prompt injection attack datasets provided by the BIPIA benchmark library.
Description
AutoPIABuilder is a factory class that maps task names to specialized builder classes within the BIPIA framework. It exposes a single class method, from_name(), which resolves a task name string to the appropriate builder class. The supported mappings are:
| Task Name | Builder Class |
|---|---|
"code" |
CodeIPIABuilder
|
"qa" |
QAIPIADataset
|
"table" |
TableIPIABuilder
|
"email" |
EmailIPIABuilder
|
"abstract" |
AbstractIPIADataset
|
Each builder inherits from BasePIABuilder and implements the logic for constructing poisoned datasets by combining clean task-specific contexts with adversarial attack strings at configurable insertion positions (start, middle, end). The factory pattern allows callers to work with a uniform interface regardless of the underlying task type, while each specialized builder handles task-specific context parsing and metadata preservation.
Usage
Import AutoPIABuilder when you need to create a prompt injection attack dataset for any of the 5 BIPIA benchmark tasks. The builder handles context loading, attack loading, cross-product generation, and DataFrame assembly in a single callable pipeline.
Code Reference
Source Location: Repository: BIPIA, File: bipia/data/__init__.py, Lines: L10-31
Signature:
class AutoPIABuilder:
@classmethod
def from_name(cls, name: str) -> BasePIABuilder
"""Resolve a task name to its specialized builder class."""
class BasePIABuilder:
def __call__(
self,
contexts: str,
attacks: str,
insert_fns=[insert_end, insert_start, insert_middle],
insert_fn_names=["end", "start", "middle"],
enable_stealth: bool = False
) -> pd.DataFrame
Import:
from bipia.data import AutoPIABuilder
I/O Contract
Inputs:
| Parameter | Type | Required | Description |
|---|---|---|---|
name |
str | Yes | Task name: "code", "qa", "table", "email", or "abstract"
|
contexts |
str | Yes | Path to JSONL file containing clean context data |
attacks |
str | Yes | Path to JSON file containing attack definitions |
insert_fns |
list | No | List of insertion functions (defaults to [insert_end, insert_start, insert_middle])
|
enable_stealth |
bool | No | Whether to base64-encode attack strings before insertion (defaults to False)
|
Outputs:
pd.DataFrame with the following columns:
| Column | Description |
|---|---|
context |
The poisoned context (original context with injected attack string) |
attack_name |
Identifier for the attack type used |
attack_str |
The raw attack string that was injected |
task_name |
The task type (e.g., "qa", "code")
|
ideal |
The ideal (correct) answer for the original clean task |
question |
The associated question (present for QA tasks) |
position |
Insertion position: "end", "start", or "middle"
|
Usage Examples
Basic usage -- build a QA prompt injection dataset:
from bipia.data import AutoPIABuilder
builder = AutoPIABuilder.from_name("qa")
seeded_builder = builder(seed=42)
df = seeded_builder("path/to/contexts.jsonl", "path/to/attacks.json")
With stealth mode enabled (base64-encoded attacks):
from bipia.data import AutoPIABuilder
builder = AutoPIABuilder.from_name("email")
seeded_builder = builder(seed=42)
df = seeded_builder("path/to/contexts.jsonl", "path/to/attacks.json", enable_stealth=True)