Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft BIPIA Prompt Construction Pipeline

From Leeroopedia
Revision as of 13:15, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Microsoft_BIPIA_Prompt_Construction_Pipeline.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Field Value
Sources BIPIA repository
Domains NLP, Prompt_Engineering
Last Updated 2026-02-14

Overview

Concrete tool for constructing model-specific prompts from poisoned dataset examples provided by the BIPIA benchmark library.

Description

The pipeline involves two components that work together to transform raw dataset rows into model-ready inputs.

Component 1: BasePIABuilder.construct_prompt(). This is an abstract method defined in the base class and implemented by each task-specific builder. Concrete implementations include EmailIPIABuilder, QAIPIADataset, SummIPIABuilder, and others. Each implementation extracts the relevant fields from a dataset example (such as context, question, and attack payload) and assembles them into a prompt. When require_system_prompt is True, it returns a tuple of (system_prompt, user_prompt). When False, it returns a single combined string. The optional ign_guidance parameter allows injecting ignore-guidance instructions for defense experiments.

Component 2: Model-specific process_fn() methods. Each model wrapper implements its own formatting logic:

  • GPTModelWSystem.process_fn creates OpenAI chat messages with both system and user roles. It calls prompt_construct_fn with require_system_prompt=True and packages the result as a list of message dictionaries.
  • GPTModelWOSystem.process_fn creates OpenAI chat messages without a system prompt. It calls prompt_construct_fn with require_system_prompt=False and places the combined string into a single user message.
  • LLMModel.process_fn uses FastChat conversation templates to format the prompt for open-source HuggingFace models. It populates the conversation object with system and user messages, generates the full prompt string, and tokenizes it to produce input_ids and attention_mask.
  • vLLMModel.process_fn also uses FastChat conversation templates but does not tokenize. It returns the formatted prompt as a raw string suitable for vLLM's text-based inference API.

Usage

Used as a map function over datasets before inference. The construct_prompt method is passed as a partial to process_fn, allowing the dataset's .map() call to apply both prompt construction and model formatting in a single pass.

Code Reference

Attribute Details
Source BIPIA repository
Files bipia/data/base.py (L43-44, construct_prompt abstract), bipia/model/gpt.py (L167-193 GPTModelWSystem.process_fn, L207-234 GPTModelWOSystem.process_fn), bipia/model/llm_worker.py (L166-188 LLMModel.process_fn), bipia/model/vllm_worker.py (L79-100 vLLMModel.process_fn)

Signatures:

BasePIABuilder.construct_prompt(
    self,
    example: Any,
    require_system_prompt: bool = True,
    ign_guidance: str = ""
) -> Tuple[str, str] | str
GPTModelWSystem.process_fn(
    self,
    example: Any,
    prompt_construct_fn: Callable
) -> Any
LLMModel.process_fn(
    self,
    example: Any,
    prompt_construct_fn: Callable
) -> Any

Import:

from bipia.data import AutoPIABuilder
from bipia.model import AutoLLM

I/O Contract

Inputs
Name Type Required Description
example dict Yes A dataset row containing fields such as context, question, attack_name, and other task-specific keys.
prompt_construct_fn Callable Yes A partial of construct_prompt with require_system_prompt and ign_guidance already bound.
Outputs
Name Type Always Present Description
message str or list[dict] Yes The formatted prompt. A list of role/content dictionaries for OpenAI chat models, or a raw string for FastChat and vLLM models.
input_ids Tensor No Tokenized input IDs. Present only for HuggingFace models (via LLMModel.process_fn).
attention_mask Tensor No Attention mask tensor. Present only for HuggingFace models (via LLMModel.process_fn).

The output is a modified copy of the input example dict with the above fields added.

Usage Examples

from functools import partial
from bipia.data import AutoPIABuilder
from bipia.model import AutoLLM

# Build task-specific prompt constructor
pia_builder = AutoPIABuilder.from_name(task_name)

# Build model wrapper
llm = AutoLLM.from_name(model_name)

# Create partial with system-prompt requirement bound
prompt_fn = partial(
    pia_builder.construct_prompt,
    require_system_prompt=llm.require_system_prompt
)

# Apply prompt construction and model formatting across the dataset
dataset = dataset.map(
    partial(llm.process_fn, prompt_construct_fn=prompt_fn)
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment