Implementation:Microsoft BIPIA Prompt Construction Pipeline
| Field | Value |
|---|---|
| Sources | BIPIA repository |
| Domains | NLP, Prompt_Engineering |
| Last Updated | 2026-02-14 |
Overview
Concrete tool for constructing model-specific prompts from poisoned dataset examples provided by the BIPIA benchmark library.
Description
The pipeline involves two components that work together to transform raw dataset rows into model-ready inputs.
Component 1: BasePIABuilder.construct_prompt(). This is an abstract method defined in the base class and implemented by each task-specific builder. Concrete implementations include EmailIPIABuilder, QAIPIADataset, SummIPIABuilder, and others. Each implementation extracts the relevant fields from a dataset example (such as context, question, and attack payload) and assembles them into a prompt. When require_system_prompt is True, it returns a tuple of (system_prompt, user_prompt). When False, it returns a single combined string. The optional ign_guidance parameter allows injecting ignore-guidance instructions for defense experiments.
Component 2: Model-specific process_fn() methods. Each model wrapper implements its own formatting logic:
GPTModelWSystem.process_fncreates OpenAI chat messages with bothsystemanduserroles. It callsprompt_construct_fnwithrequire_system_prompt=Trueand packages the result as a list of message dictionaries.GPTModelWOSystem.process_fncreates OpenAI chat messages without a system prompt. It callsprompt_construct_fnwithrequire_system_prompt=Falseand places the combined string into a singleusermessage.LLMModel.process_fnuses FastChat conversation templates to format the prompt for open-source HuggingFace models. It populates the conversation object with system and user messages, generates the full prompt string, and tokenizes it to produceinput_idsandattention_mask.vLLMModel.process_fnalso uses FastChat conversation templates but does not tokenize. It returns the formatted prompt as a raw string suitable for vLLM's text-based inference API.
Usage
Used as a map function over datasets before inference. The construct_prompt method is passed as a partial to process_fn, allowing the dataset's .map() call to apply both prompt construction and model formatting in a single pass.
Code Reference
| Attribute | Details |
|---|---|
| Source | BIPIA repository |
| Files | bipia/data/base.py (L43-44, construct_prompt abstract), bipia/model/gpt.py (L167-193 GPTModelWSystem.process_fn, L207-234 GPTModelWOSystem.process_fn), bipia/model/llm_worker.py (L166-188 LLMModel.process_fn), bipia/model/vllm_worker.py (L79-100 vLLMModel.process_fn)
|
Signatures:
BasePIABuilder.construct_prompt(
self,
example: Any,
require_system_prompt: bool = True,
ign_guidance: str = ""
) -> Tuple[str, str] | str
GPTModelWSystem.process_fn(
self,
example: Any,
prompt_construct_fn: Callable
) -> Any
LLMModel.process_fn(
self,
example: Any,
prompt_construct_fn: Callable
) -> Any
Import:
from bipia.data import AutoPIABuilder
from bipia.model import AutoLLM
I/O Contract
| Inputs | |||
|---|---|---|---|
| Name | Type | Required | Description |
example |
dict | Yes | A dataset row containing fields such as context, question, attack_name, and other task-specific keys.
|
prompt_construct_fn |
Callable | Yes | A partial of construct_prompt with require_system_prompt and ign_guidance already bound.
|
| Outputs | |||
|---|---|---|---|
| Name | Type | Always Present | Description |
message |
str or list[dict] | Yes | The formatted prompt. A list of role/content dictionaries for OpenAI chat models, or a raw string for FastChat and vLLM models. |
input_ids |
Tensor | No | Tokenized input IDs. Present only for HuggingFace models (via LLMModel.process_fn).
|
attention_mask |
Tensor | No | Attention mask tensor. Present only for HuggingFace models (via LLMModel.process_fn).
|
The output is a modified copy of the input example dict with the above fields added.
Usage Examples
from functools import partial
from bipia.data import AutoPIABuilder
from bipia.model import AutoLLM
# Build task-specific prompt constructor
pia_builder = AutoPIABuilder.from_name(task_name)
# Build model wrapper
llm = AutoLLM.from_name(model_name)
# Create partial with system-prompt requirement bound
prompt_fn = partial(
pia_builder.construct_prompt,
require_system_prompt=llm.require_system_prompt
)
# Apply prompt construction and model formatting across the dataset
dataset = dataset.map(
partial(llm.process_fn, prompt_construct_fn=prompt_fn)
)