Implementation:Huggingface Trl RewardTrainer Evaluate Save

Property	Value
Implementation Name	RewardTrainer Evaluate Save
Technology	Huggingface TRL
Type	External Tool Doc
Workflow	Reward Model Training
Principle	Principle:Huggingface_Trl_Reward_Evaluation_and_Saving

Overview

Description

Evaluation in RewardTrainer reuses the compute_loss method on the evaluation dataset, producing the same Bradley-Terry loss and metrics (accuracy, margin, reward statistics) as during training. Model saving is handled through the inherited save_model method from Trainer, with TRL adding automatic model card generation during checkpoint saves via the overridden _save_checkpoint method.

Usage

Evaluation runs automatically during training when eval_strategy is configured. Model saving can be triggered explicitly via trainer.save_model(output_dir) or happens automatically at checkpoint intervals. The reward training script calls trainer.save_model(training_args.output_dir) after trainer.train() completes.

Code Reference

Source Location

_save_checkpoint: trl/trainer/reward_trainer.py lines 630-636
Reward training script (save/push): trl/scripts/reward.py lines 81-86

Signature

def _save_checkpoint(self, model, trial):
    """
    Save checkpoint with automatic model card generation.

    Creates a model card with the model name derived from
    hub_model_id or output_dir, then delegates to the parent
    Trainer._save_checkpoint.
    """
    if self.args.hub_model_id is None:
        model_name = Path(self.args.output_dir).name
    else:
        model_name = self.args.hub_model_id.split("/")[-1]
    self.create_model_card(model_name=model_name)
    super()._save_checkpoint(model, trial)

def save_model(self, output_dir=None) -> None:
    """
    Inherited from transformers.Trainer.

    Saves the model weights, tokenizer, and training arguments
    to the specified output directory. For PEFT models, saves
    only the adapter weights.
    """

Import

from trl import RewardTrainer, RewardConfig

I/O Contract

save_model Inputs

Parameter	Type	Default	Description
output_dir	str or None	None	Directory to save the model; defaults to args.output_dir

save_model Outputs

Output	Location	Description
Model weights	output_dir/model.safetensors	Full model weights or PEFT adapter weights
Tokenizer files	output_dir/	Tokenizer configuration and vocabulary files
Training args	output_dir/training_args.bin	Serialized RewardConfig
Model card	output_dir/README.md	Auto-generated model card with training metadata

Evaluation Metrics

Metric	Key	Description
Evaluation loss	eval_loss	Bradley-Terry preference loss on evaluation set
Evaluation accuracy	eval_accuracy	Fraction of correctly ranked preference pairs
Evaluation margin	eval_margin	Mean reward difference (chosen - rejected)
Min reward	eval_min_reward	Minimum reward in evaluation batch
Mean reward	eval_mean_reward	Mean reward across all evaluation responses
Max reward	eval_max_reward	Maximum reward in evaluation batch

Usage Examples

Training Script Pattern

from trl import RewardTrainer, RewardConfig
from datasets import load_dataset

dataset = load_dataset("trl-lib/ultrafeedback_binarized")

trainer = RewardTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    args=RewardConfig(
        output_dir="reward-model",
        eval_strategy="steps",
        eval_steps=500,
    ),
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
)

# Train (evaluation runs automatically at eval_steps intervals)
trainer.train()

# Save final model
trainer.save_model("reward-model-final")

# Optionally push to Huggingface Hub
if trainer.args.push_to_hub:
    trainer.push_to_hub(dataset_name="trl-lib/ultrafeedback_binarized")

Loading Saved Reward Model for PPO

from transformers import AutoModelForSequenceClassification

# Load the saved reward model for downstream PPO training
reward_model = AutoModelForSequenceClassification.from_pretrained(
    "reward-model-final",
    num_labels=1,
)

# Also initialize the value model from the same checkpoint
value_model = AutoModelForSequenceClassification.from_pretrained(
    "reward-model-final",
    num_labels=1,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment