Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft BIPIA AutoPIABuilder

From Leeroopedia
Field Value
Sources Repository: Microsoft BIPIA
Domains NLP, Security, Data_Engineering
Last Updated 2026-02-14

Overview

Concrete tool for constructing prompt injection attack datasets provided by the BIPIA benchmark library.

Description

AutoPIABuilder is a factory class that maps task names to specialized builder classes within the BIPIA framework. It exposes a single class method, from_name(), which resolves a task name string to the appropriate builder class. The supported mappings are:

Task Name Builder Class
"code" CodeIPIABuilder
"qa" QAIPIADataset
"table" TableIPIABuilder
"email" EmailIPIABuilder
"abstract" AbstractIPIADataset

Each builder inherits from BasePIABuilder and implements the logic for constructing poisoned datasets by combining clean task-specific contexts with adversarial attack strings at configurable insertion positions (start, middle, end). The factory pattern allows callers to work with a uniform interface regardless of the underlying task type, while each specialized builder handles task-specific context parsing and metadata preservation.

Usage

Import AutoPIABuilder when you need to create a prompt injection attack dataset for any of the 5 BIPIA benchmark tasks. The builder handles context loading, attack loading, cross-product generation, and DataFrame assembly in a single callable pipeline.

Code Reference

Source Location: Repository: BIPIA, File: bipia/data/__init__.py, Lines: L10-31

Signature:

class AutoPIABuilder:
    @classmethod
    def from_name(cls, name: str) -> BasePIABuilder
        """Resolve a task name to its specialized builder class."""

class BasePIABuilder:
    def __call__(
        self,
        contexts: str,
        attacks: str,
        insert_fns=[insert_end, insert_start, insert_middle],
        insert_fn_names=["end", "start", "middle"],
        enable_stealth: bool = False
    ) -> pd.DataFrame

Import:

from bipia.data import AutoPIABuilder

I/O Contract

Inputs:

Parameter Type Required Description
name str Yes Task name: "code", "qa", "table", "email", or "abstract"
contexts str Yes Path to JSONL file containing clean context data
attacks str Yes Path to JSON file containing attack definitions
insert_fns list No List of insertion functions (defaults to [insert_end, insert_start, insert_middle])
enable_stealth bool No Whether to base64-encode attack strings before insertion (defaults to False)

Outputs:

pd.DataFrame with the following columns:

Column Description
context The poisoned context (original context with injected attack string)
attack_name Identifier for the attack type used
attack_str The raw attack string that was injected
task_name The task type (e.g., "qa", "code")
ideal The ideal (correct) answer for the original clean task
question The associated question (present for QA tasks)
position Insertion position: "end", "start", or "middle"

Usage Examples

Basic usage -- build a QA prompt injection dataset:

from bipia.data import AutoPIABuilder

builder = AutoPIABuilder.from_name("qa")
seeded_builder = builder(seed=42)
df = seeded_builder("path/to/contexts.jsonl", "path/to/attacks.json")

With stealth mode enabled (base64-encoded attacks):

from bipia.data import AutoPIABuilder

builder = AutoPIABuilder.from_name("email")
seeded_builder = builder(seed=42)
df = seeded_builder("path/to/contexts.jsonl", "path/to/attacks.json", enable_stealth=True)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment