Principle:Princeton nlp SimPO Response Post Processing
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP |
| Last Updated | 2026-02-08 04:30 GMT |
Overview
A data cleaning step that combines multi-seed responses per prompt and filters out prompts where all generated responses are identical.
Description
After generating responses with multiple random seeds, the outputs must be consolidated. Post-processing reads all per-seed output files, groups responses by prompt, and filters out prompts where every seed produced the exact same response (since identical candidates cannot form meaningful preference pairs). This step is necessary to ensure the downstream reward model annotation step has diverse candidates to score.
Usage
Use this principle after multi-seed response generation and before reward model annotation. It is the second step of the three-step on-policy data generation pipeline.
Theoretical Basis
The filtering logic follows a simple diversity criterion:
- Aggregation — For each prompt, collect all responses across seeds into a single list
- Deduplication check — If all responses in the list are identical (set(responses) == 1), discard the prompt
- Output — Produce a single file with diverse response sets per prompt
Pseudo-code:
# Abstract algorithm (NOT real implementation)
for prompt in all_prompts:
responses = [seed_outputs[seed][prompt] for seed in seeds]
if len(set(responses)) > 1: # At least one response differs
output.append({"prompt": prompt, "all_generated_responses": responses})
# else: discard (all responses identical)