Overview
Description
Evaluation in RewardTrainer reuses the compute_loss method on the evaluation dataset, producing the same Bradley-Terry loss and metrics (accuracy, margin, reward statistics) as during training. Model saving is handled through the inherited save_model method from Trainer, with TRL adding automatic model card generation during checkpoint saves via the overridden _save_checkpoint method.
Usage
Evaluation runs automatically during training when eval_strategy is configured. Model saving can be triggered explicitly via trainer.save_model(output_dir) or happens automatically at checkpoint intervals. The reward training script calls trainer.save_model(training_args.output_dir) after trainer.train() completes.
Code Reference
Source Location
- _save_checkpoint:
trl/trainer/reward_trainer.py lines 630-636
- Reward training script (save/push):
trl/scripts/reward.py lines 81-86
Signature
def _save_checkpoint(self, model, trial):
"""
Save checkpoint with automatic model card generation.
Creates a model card with the model name derived from
hub_model_id or output_dir, then delegates to the parent
Trainer._save_checkpoint.
"""
if self.args.hub_model_id is None:
model_name = Path(self.args.output_dir).name
else:
model_name = self.args.hub_model_id.split("/")[-1]
self.create_model_card(model_name=model_name)
super()._save_checkpoint(model, trial)
def save_model(self, output_dir=None) -> None:
"""
Inherited from transformers.Trainer.
Saves the model weights, tokenizer, and training arguments
to the specified output directory. For PEFT models, saves
only the adapter weights.
"""
Import
from trl import RewardTrainer, RewardConfig
I/O Contract
save_model Inputs
| Parameter |
Type |
Default |
Description
|
| output_dir |
str or None |
None |
Directory to save the model; defaults to args.output_dir
|
save_model Outputs
| Output |
Location |
Description
|
| Model weights |
output_dir/model.safetensors |
Full model weights or PEFT adapter weights
|
| Tokenizer files |
output_dir/ |
Tokenizer configuration and vocabulary files
|
| Training args |
output_dir/training_args.bin |
Serialized RewardConfig
|
| Model card |
output_dir/README.md |
Auto-generated model card with training metadata
|
Evaluation Metrics
| Metric |
Key |
Description
|
| Evaluation loss |
eval_loss |
Bradley-Terry preference loss on evaluation set
|
| Evaluation accuracy |
eval_accuracy |
Fraction of correctly ranked preference pairs
|
| Evaluation margin |
eval_margin |
Mean reward difference (chosen - rejected)
|
| Min reward |
eval_min_reward |
Minimum reward in evaluation batch
|
| Mean reward |
eval_mean_reward |
Mean reward across all evaluation responses
|
| Max reward |
eval_max_reward |
Maximum reward in evaluation batch
|
Usage Examples
Training Script Pattern
from trl import RewardTrainer, RewardConfig
from datasets import load_dataset
dataset = load_dataset("trl-lib/ultrafeedback_binarized")
trainer = RewardTrainer(
model="Qwen/Qwen2.5-0.5B-Instruct",
args=RewardConfig(
output_dir="reward-model",
eval_strategy="steps",
eval_steps=500,
),
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
)
# Train (evaluation runs automatically at eval_steps intervals)
trainer.train()
# Save final model
trainer.save_model("reward-model-final")
# Optionally push to Huggingface Hub
if trainer.args.push_to_hub:
trainer.push_to_hub(dataset_name="trl-lib/ultrafeedback_binarized")
Loading Saved Reward Model for PPO
from transformers import AutoModelForSequenceClassification
# Load the saved reward model for downstream PPO training
reward_model = AutoModelForSequenceClassification.from_pretrained(
"reward-model-final",
num_labels=1,
)
# Also initialize the value model from the same checkpoint
value_model = AutoModelForSequenceClassification.from_pretrained(
"reward-model-final",
num_labels=1,
)
Related Pages