Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Trl RewardTrainer Evaluate Save

From Leeroopedia


Property Value
Implementation Name RewardTrainer Evaluate Save
Technology Huggingface TRL
Type External Tool Doc
Workflow Reward Model Training
Principle Principle:Huggingface_Trl_Reward_Evaluation_and_Saving

Overview

Description

Evaluation in RewardTrainer reuses the compute_loss method on the evaluation dataset, producing the same Bradley-Terry loss and metrics (accuracy, margin, reward statistics) as during training. Model saving is handled through the inherited save_model method from Trainer, with TRL adding automatic model card generation during checkpoint saves via the overridden _save_checkpoint method.

Usage

Evaluation runs automatically during training when eval_strategy is configured. Model saving can be triggered explicitly via trainer.save_model(output_dir) or happens automatically at checkpoint intervals. The reward training script calls trainer.save_model(training_args.output_dir) after trainer.train() completes.

Code Reference

Source Location

  • _save_checkpoint: trl/trainer/reward_trainer.py lines 630-636
  • Reward training script (save/push): trl/scripts/reward.py lines 81-86

Signature

def _save_checkpoint(self, model, trial):
    """
    Save checkpoint with automatic model card generation.

    Creates a model card with the model name derived from
    hub_model_id or output_dir, then delegates to the parent
    Trainer._save_checkpoint.
    """
    if self.args.hub_model_id is None:
        model_name = Path(self.args.output_dir).name
    else:
        model_name = self.args.hub_model_id.split("/")[-1]
    self.create_model_card(model_name=model_name)
    super()._save_checkpoint(model, trial)
def save_model(self, output_dir=None) -> None:
    """
    Inherited from transformers.Trainer.

    Saves the model weights, tokenizer, and training arguments
    to the specified output directory. For PEFT models, saves
    only the adapter weights.
    """

Import

from trl import RewardTrainer, RewardConfig

I/O Contract

save_model Inputs

Parameter Type Default Description
output_dir str or None None Directory to save the model; defaults to args.output_dir

save_model Outputs

Output Location Description
Model weights output_dir/model.safetensors Full model weights or PEFT adapter weights
Tokenizer files output_dir/ Tokenizer configuration and vocabulary files
Training args output_dir/training_args.bin Serialized RewardConfig
Model card output_dir/README.md Auto-generated model card with training metadata

Evaluation Metrics

Metric Key Description
Evaluation loss eval_loss Bradley-Terry preference loss on evaluation set
Evaluation accuracy eval_accuracy Fraction of correctly ranked preference pairs
Evaluation margin eval_margin Mean reward difference (chosen - rejected)
Min reward eval_min_reward Minimum reward in evaluation batch
Mean reward eval_mean_reward Mean reward across all evaluation responses
Max reward eval_max_reward Maximum reward in evaluation batch

Usage Examples

Training Script Pattern

from trl import RewardTrainer, RewardConfig
from datasets import load_dataset

dataset = load_dataset("trl-lib/ultrafeedback_binarized")

trainer = RewardTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    args=RewardConfig(
        output_dir="reward-model",
        eval_strategy="steps",
        eval_steps=500,
    ),
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
)

# Train (evaluation runs automatically at eval_steps intervals)
trainer.train()

# Save final model
trainer.save_model("reward-model-final")

# Optionally push to Huggingface Hub
if trainer.args.push_to_hub:
    trainer.push_to_hub(dataset_name="trl-lib/ultrafeedback_binarized")

Loading Saved Reward Model for PPO

from transformers import AutoModelForSequenceClassification

# Load the saved reward model for downstream PPO training
reward_model = AutoModelForSequenceClassification.from_pretrained(
    "reward-model-final",
    num_labels=1,
)

# Also initialize the value model from the same checkpoint
value_model = AutoModelForSequenceClassification.from_pretrained(
    "reward-model-final",
    num_labels=1,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment