Implementation:Microsoft BIPIA Clean Response Inference
Appearance
Overview
Concrete tool for collecting clean (attack-free) model responses provided by the BIPIA benchmark examples.
Description
The inference() function in collect_clean_response.py builds a dataset using AutoPIABuilder with no_insert as the only insertion function and {"None": ""} as a dummy attack dict. This creates clean prompts with original task context. The function then runs the same inference pipeline as the main run.py but outputs clean responses used for defense training data.
Usage
Run via CLI:
python examples/collect_clean_response.py \
--mode inference \
--dataset_name qa \
--context_data_file /path/to/context_data.jsonl \
--llm_config_file /path/to/llm_config.yaml \
--output_path /path/to/output/ \
--split train
Code Reference
| Field | Value |
|---|---|
| Source | BIPIA repo |
| File | examples/collect_clean_response.py |
| Lines | L228-406 |
| Signature | def inference(args) -> None
|
| Key call | pia_builder(args.context_data_file, {"None": ""}, insert_fns=[no_insert])
|
| Import | from bipia.data.utils import no_insert
|
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| dataset_name | str | Yes | Task type (e.g., "qa", "email", "code") |
| context_data_file | str | Yes | Path to the context data JSONL file |
| llm_config_file | str | Yes | Path to the LLM configuration YAML file |
| output_path | str | Yes | Directory for output files |
| split | str | Yes | Dataset split ("train" or "test") |
| batch_size | int | No | Inference batch size |
| max_new_tokens | int | No | Maximum tokens to generate per response |
Outputs
JSONL file with the following fields per record:
| Field | Description |
|---|---|
| attack_name | Always "None-0" (dummy attack identifier) |
| task_name | The task identifier from the dataset |
| response | The clean model response (no attack present) |
| message | The full prompt sent to the model |
| target | The ideal ground-truth answer |
| position | The position field from the dataset |
Usage Examples
CLI invocation:
python examples/collect_clean_response.py \
--mode inference \
--dataset_name qa \
--context_data_file data/qa/context.jsonl \
--llm_config_file configs/llama2_7b.yaml \
--output_path output/clean_responses/ \
--split train \
--batch_size 8 \
--max_new_tokens 512
Key no_insert pattern (inside inference()):
from bipia.data.utils import no_insert
# Build clean dataset: no_insert returns context unchanged, dummy attack dict
pia_dataset = pia_builder(
args.context_data_file,
{"None": ""},
insert_fns=[no_insert]
)
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment