Implementation:Microsoft BIPIA Prompt Construction Pipeline

Field	Value
Sources	BIPIA repository
Domains	NLP, Prompt_Engineering
Last Updated	2026-02-14

Overview

Concrete tool for constructing model-specific prompts from poisoned dataset examples provided by the BIPIA benchmark library.

Description

The pipeline involves two components that work together to transform raw dataset rows into model-ready inputs.

Component 1: BasePIABuilder.construct_prompt(). This is an abstract method defined in the base class and implemented by each task-specific builder. Concrete implementations include EmailIPIABuilder, QAIPIADataset, SummIPIABuilder, and others. Each implementation extracts the relevant fields from a dataset example (such as context, question, and attack payload) and assembles them into a prompt. When require_system_prompt is True, it returns a tuple of (system_prompt, user_prompt). When False, it returns a single combined string. The optional ign_guidance parameter allows injecting ignore-guidance instructions for defense experiments.

Component 2: Model-specific process_fn() methods. Each model wrapper implements its own formatting logic:

GPTModelWSystem.process_fn creates OpenAI chat messages with both system and user roles. It calls prompt_construct_fn with require_system_prompt=True and packages the result as a list of message dictionaries.
GPTModelWOSystem.process_fn creates OpenAI chat messages without a system prompt. It calls prompt_construct_fn with require_system_prompt=False and places the combined string into a single user message.
LLMModel.process_fn uses FastChat conversation templates to format the prompt for open-source HuggingFace models. It populates the conversation object with system and user messages, generates the full prompt string, and tokenizes it to produce input_ids and attention_mask.
vLLMModel.process_fn also uses FastChat conversation templates but does not tokenize. It returns the formatted prompt as a raw string suitable for vLLM's text-based inference API.

Usage

Used as a map function over datasets before inference. The construct_prompt method is passed as a partial to process_fn, allowing the dataset's .map() call to apply both prompt construction and model formatting in a single pass.

Code Reference

Attribute	Details
Source	BIPIA repository
Files	`bipia/data/base.py` (L43-44, `construct_prompt` abstract), `bipia/model/gpt.py` (L167-193 `GPTModelWSystem.process_fn`, L207-234 `GPTModelWOSystem.process_fn`), `bipia/model/llm_worker.py` (L166-188 `LLMModel.process_fn`), `bipia/model/vllm_worker.py` (L79-100 `vLLMModel.process_fn`)

Signatures:

BasePIABuilder.construct_prompt(
    self,
    example: Any,
    require_system_prompt: bool = True,
    ign_guidance: str = ""
) -> Tuple[str, str] | str

GPTModelWSystem.process_fn(
    self,
    example: Any,
    prompt_construct_fn: Callable
) -> Any

LLMModel.process_fn(
    self,
    example: Any,
    prompt_construct_fn: Callable
) -> Any

Import:

from bipia.data import AutoPIABuilder
from bipia.model import AutoLLM

I/O Contract

Inputs
Name	Type	Required	Description
`example`	dict	Yes	A dataset row containing fields such as `context`, `question`, `attack_name`, and other task-specific keys.
`prompt_construct_fn`	Callable	Yes	A partial of `construct_prompt` with `require_system_prompt` and `ign_guidance` already bound.

Outputs
Name	Type	Always Present	Description
`message`	str or list[dict]	Yes	The formatted prompt. A list of role/content dictionaries for OpenAI chat models, or a raw string for FastChat and vLLM models.
`input_ids`	Tensor	No	Tokenized input IDs. Present only for HuggingFace models (via `LLMModel.process_fn`).
`attention_mask`	Tensor	No	Attention mask tensor. Present only for HuggingFace models (via `LLMModel.process_fn`).

The output is a modified copy of the input example dict with the above fields added.

Usage Examples

from functools import partial
from bipia.data import AutoPIABuilder
from bipia.model import AutoLLM

# Build task-specific prompt constructor
pia_builder = AutoPIABuilder.from_name(task_name)

# Build model wrapper
llm = AutoLLM.from_name(model_name)

# Create partial with system-prompt requirement bound
prompt_fn = partial(
    pia_builder.construct_prompt,
    require_system_prompt=llm.require_system_prompt
)

# Apply prompt construction and model formatting across the dataset
dataset = dataset.map(
    partial(llm.process_fn, prompt_construct_fn=prompt_fn)
)

Related Pages

Principle:Microsoft_BIPIA_Prompt_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment