Implementation:OpenRLHF OpenRLHF Rejection sampling processor
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Alignment, Data_Processing |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for selecting best-of-N responses via rejection sampling provided by OpenRLHF.
Description
The rejection_sampling_processor function takes a list of scored generation objects (input, output, reward), groups them by input prompt, and keeps only the highest-reward response for each prompt. The result is a filtered SFT-compatible dataset.
Usage
Called after batch vLLM generation and batch reward model inference. The output is used to create a new SFT dataset for retraining.
Code Reference
Source Location
- Repository: OpenRLHF
- File: openrlhf/utils/processor.py
- Lines: L40-53
Signature
def rejection_sampling_processor(args, objs):
"""
Select best response per prompt by reward score.
Args:
args: CLI arguments (unused in this processor)
objs: List of dicts with keys: "input", "output", "reward"
Returns:
List of dicts: [{"input": str, "output": str, "reward": float}]
One entry per unique prompt with the highest-reward response.
"""
Import
from openrlhf.utils.processor import rejection_sampling_processor
# or
from openrlhf.utils.processor import get_processor
processor = get_processor("rs")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args | Namespace | Yes | CLI arguments |
| objs | List[Dict] | Yes | Scored generations: [{input, output, reward}, ...] |
Outputs
| Name | Type | Description |
|---|---|---|
| filtered | List[Dict] | Best response per prompt: [{input, output, reward}, ...] |
Usage Examples
from openrlhf.utils.processor import get_processor
processor = get_processor("rs")
filtered_data = processor(args, scored_generations)
# filtered_data contains one best response per unique prompt
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment