Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Microsoft BIPIA Clean Response Inference

From Leeroopedia

Template:Metadata

Overview

Concrete tool for collecting clean (attack-free) model responses provided by the BIPIA benchmark examples.

Description

The inference() function in collect_clean_response.py builds a dataset using AutoPIABuilder with no_insert as the only insertion function and {"None": ""} as a dummy attack dict. This creates clean prompts with original task context. The function then runs the same inference pipeline as the main run.py but outputs clean responses used for defense training data.

Usage

Run via CLI:

python examples/collect_clean_response.py \
    --mode inference \
    --dataset_name qa \
    --context_data_file /path/to/context_data.jsonl \
    --llm_config_file /path/to/llm_config.yaml \
    --output_path /path/to/output/ \
    --split train

Code Reference

Field Value
Source BIPIA repo
File examples/collect_clean_response.py
Lines L228-406
Signature def inference(args) -> None
Key call pia_builder(args.context_data_file, {"None": ""}, insert_fns=[no_insert])
Import from bipia.data.utils import no_insert

I/O Contract

Inputs

Parameter Type Required Description
dataset_name str Yes Task type (e.g., "qa", "email", "code")
context_data_file str Yes Path to the context data JSONL file
llm_config_file str Yes Path to the LLM configuration YAML file
output_path str Yes Directory for output files
split str Yes Dataset split ("train" or "test")
batch_size int No Inference batch size
max_new_tokens int No Maximum tokens to generate per response

Outputs

JSONL file with the following fields per record:

Field Description
attack_name Always "None-0" (dummy attack identifier)
task_name The task identifier from the dataset
response The clean model response (no attack present)
message The full prompt sent to the model
target The ideal ground-truth answer
position The position field from the dataset

Usage Examples

CLI invocation:

python examples/collect_clean_response.py \
    --mode inference \
    --dataset_name qa \
    --context_data_file data/qa/context.jsonl \
    --llm_config_file configs/llama2_7b.yaml \
    --output_path output/clean_responses/ \
    --split train \
    --batch_size 8 \
    --max_new_tokens 512

Key no_insert pattern (inside inference()):

from bipia.data.utils import no_insert

# Build clean dataset: no_insert returns context unchanged, dummy attack dict
pia_dataset = pia_builder(
    args.context_data_file,
    {"None": ""},
    insert_fns=[no_insert]
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment