Principle:CarperAI Trlx Rejection Fine Tuning

Knowledge Sources	Best-of-N Sampling
Domains	Reinforcement_Learning, NLP, Fine_Tuning
Last Updated	2026-02-07 16:00 GMT

Overview

Training method that generates multiple completions per prompt, scores them, and fine-tunes the model on the highest-scoring subset using progressive quality thresholds.

Description

Rejection Fine-Tuning (RFT) is an alternative to policy gradient methods (PPO) for aligning language models with reward signals. Instead of computing gradients through the reward function, RFT generates N completions per prompt, scores each with a reward function, selects those above a percentile threshold, and trains on the selected completions using standard supervised learning (cross-entropy loss). The percentile threshold increases progressively over training, gradually raising the quality bar as the model improves.

Usage

Use this principle when a simpler alternative to PPO is desired, or when the reward function is expensive to evaluate (since RFT evaluates rewards only during data collection, not during gradient computation). Particularly effective when the model already has reasonable performance and needs refinement.

Theoretical Basis

RFT optimizes a filtered maximum likelihood objective:

$ℒ_{RFT} = - 𝔼_{x \sim D} [\sum_{y \in {Top}_{k} (G (x))} \log π_{θ} (y | x)]$

where $G (x) = {y_{1}, \dots, y_{N}} \sim π_{θ} (\cdot | x)$ are N sampled completions and ${Top}_{k}$ selects those above the score percentile threshold.

Progressive Thresholding: $p_{t} = p_{start} + \frac{t}{T} (p_{end} - p_{start})$

where $p_{t}$ is the percentile threshold at step $t$ .

Pseudo-code Logic:

# Abstract algorithm (NOT real implementation)
for step in range(n_improve_steps):
    percentile = start_percentile + step/n_steps * (end_percentile - start_percentile)
    for prompt in prompts:
        completions = generate(prompt, n=n_generations)
        scores = reward_fn(completions)
        threshold = np.percentile(scores, percentile * 100)
        best = [c for c, s in zip(completions, scores) if s >= threshold]
        train_supervised(model, prompt, best)

Related Pages

Implementation:CarperAI_Trlx_Accelerate_RFT_Trainer

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment