Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Princeton nlp SimPO Response Post Processing

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, NLP
Last Updated 2026-02-08 04:30 GMT

Overview

A data cleaning step that combines multi-seed responses per prompt and filters out prompts where all generated responses are identical.

Description

After generating responses with multiple random seeds, the outputs must be consolidated. Post-processing reads all per-seed output files, groups responses by prompt, and filters out prompts where every seed produced the exact same response (since identical candidates cannot form meaningful preference pairs). This step is necessary to ensure the downstream reward model annotation step has diverse candidates to score.

Usage

Use this principle after multi-seed response generation and before reward model annotation. It is the second step of the three-step on-policy data generation pipeline.

Theoretical Basis

The filtering logic follows a simple diversity criterion:

  1. Aggregation — For each prompt, collect all responses across seeds into a single list
  2. Deduplication check — If all responses in the list are identical (set(responses) == 1), discard the prompt
  3. Output — Produce a single file with diverse response sets per prompt

Pseudo-code:

# Abstract algorithm (NOT real implementation)
for prompt in all_prompts:
    responses = [seed_outputs[seed][prompt] for seed in seeds]
    if len(set(responses)) > 1:  # At least one response differs
        output.append({"prompt": prompt, "all_generated_responses": responses})
    # else: discard (all responses identical)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment