Implementation:Hiyouga LLaMA Factory RM Trainer
| Knowledge Sources | |
|---|---|
| Domains | Reward Modeling, RLHF, Trainer |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
PairwiseTrainer is a custom HuggingFace Trainer subclass that implements Bradley-Terry pairwise loss for reward model training.
Description
PairwiseTrainer extends the HuggingFace Trainer class, overriding compute_loss to split a concatenated batch into chosen and rejected halves, extract scalar reward scores at the last non-padding token position using gather, and compute the negative log-sigmoid of the score difference as the training loss. The class also includes FixValueHeadModelCallback for proper value-head checkpointing, custom optimizer and scheduler support, duplicate tensor deduplication in _save for safetensors compatibility, and a save_predictions method that writes chosen/rejected reward scores as JSONL.
Usage
Use PairwiseTrainer when training a reward model from human preference data in a pairwise (chosen vs. rejected) format. It is instantiated by the run_rm workflow function and expects input batches where the first half contains chosen examples and the second half contains rejected examples.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/train/rm/trainer.py
- Lines: 1-150
Signature
class PairwiseTrainer(Trainer):
def __init__(
self,
finetuning_args: "FinetuningArguments",
processor: Optional["ProcessorMixin"],
**kwargs,
) -> None
def create_optimizer(self) -> "torch.optim.Optimizer"
def create_scheduler(
self,
num_training_steps: int,
optimizer: Optional["torch.optim.Optimizer"] = None,
) -> "torch.optim.lr_scheduler.LRScheduler"
def compute_loss(
self,
model: "PreTrainedModel",
inputs: dict[str, "torch.Tensor"],
return_outputs: bool = False,
**kwargs,
) -> Union["torch.Tensor", tuple["torch.Tensor", list["torch.Tensor"]]]
def save_predictions(self, predict_results: "PredictionOutput") -> None
Import
from llamafactory.train.rm.trainer import PairwiseTrainer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| finetuning_args | FinetuningArguments | Yes | Fine-tuning configuration including use_badam and disable_shuffling flags |
| processor | Optional[ProcessorMixin] | Yes | Multimodal processor; if provided, a SaveProcessorCallback is added |
| **kwargs | dict | Yes | Passed to parent Trainer; must include model, args, data_collator, train_dataset, etc. |
Outputs
| Name | Type | Description |
|---|---|---|
| loss (from compute_loss) | torch.Tensor | Bradley-Terry loss: -logsigmoid(chosen_score - rejected_score).mean() |
| outputs (from compute_loss, optional) | tuple[torch.Tensor, list[torch.Tensor]] | When return_outputs=True, returns (loss, [loss, chosen_scores, rejected_scores]) |
| generated_predictions.jsonl (from save_predictions) | File | JSONL file with chosen and rejected reward scores per example |
Usage Examples
# Typically instantiated by run_rm, not directly
from llamafactory.train.rm.trainer import PairwiseTrainer
trainer = PairwiseTrainer(
model=model,
args=training_args,
finetuning_args=finetuning_args,
data_collator=data_collator,
callbacks=callbacks,
compute_metrics=ComputeAccuracy(),
processor=processor,
tokenizer=tokenizer,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
# Training
train_result = trainer.train()
# Prediction with score saving
predict_results = trainer.predict(eval_dataset)
trainer.save_predictions(predict_results)
# Output: {"chosen": 1.23, "rejected": -0.45}
Related Pages
- Hiyouga_LLaMA_Factory_RM_Workflow - The workflow orchestrator that creates and drives PairwiseTrainer
- Hiyouga_LLaMA_Factory_RM_Metric - ComputeAccuracy metric used with PairwiseTrainer
- Hiyouga_LLaMA_Factory_Callbacks - FixValueHeadModelCallback and SaveProcessorCallback used internally
- Hiyouga_LLaMA_Factory_PPO_Workflow - PPO training that consumes the reward models produced here