Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory RM Trainer

From Leeroopedia


Knowledge Sources
Domains Reward Modeling, RLHF, Trainer
Last Updated 2026-02-06 19:00 GMT

Overview

PairwiseTrainer is a custom HuggingFace Trainer subclass that implements Bradley-Terry pairwise loss for reward model training.

Description

PairwiseTrainer extends the HuggingFace Trainer class, overriding compute_loss to split a concatenated batch into chosen and rejected halves, extract scalar reward scores at the last non-padding token position using gather, and compute the negative log-sigmoid of the score difference as the training loss. The class also includes FixValueHeadModelCallback for proper value-head checkpointing, custom optimizer and scheduler support, duplicate tensor deduplication in _save for safetensors compatibility, and a save_predictions method that writes chosen/rejected reward scores as JSONL.

Usage

Use PairwiseTrainer when training a reward model from human preference data in a pairwise (chosen vs. rejected) format. It is instantiated by the run_rm workflow function and expects input batches where the first half contains chosen examples and the second half contains rejected examples.

Code Reference

Source Location

Signature

class PairwiseTrainer(Trainer):
    def __init__(
        self,
        finetuning_args: "FinetuningArguments",
        processor: Optional["ProcessorMixin"],
        **kwargs,
    ) -> None

    def create_optimizer(self) -> "torch.optim.Optimizer"

    def create_scheduler(
        self,
        num_training_steps: int,
        optimizer: Optional["torch.optim.Optimizer"] = None,
    ) -> "torch.optim.lr_scheduler.LRScheduler"

    def compute_loss(
        self,
        model: "PreTrainedModel",
        inputs: dict[str, "torch.Tensor"],
        return_outputs: bool = False,
        **kwargs,
    ) -> Union["torch.Tensor", tuple["torch.Tensor", list["torch.Tensor"]]]

    def save_predictions(self, predict_results: "PredictionOutput") -> None

Import

from llamafactory.train.rm.trainer import PairwiseTrainer

I/O Contract

Inputs

Name Type Required Description
finetuning_args FinetuningArguments Yes Fine-tuning configuration including use_badam and disable_shuffling flags
processor Optional[ProcessorMixin] Yes Multimodal processor; if provided, a SaveProcessorCallback is added
**kwargs dict Yes Passed to parent Trainer; must include model, args, data_collator, train_dataset, etc.

Outputs

Name Type Description
loss (from compute_loss) torch.Tensor Bradley-Terry loss: -logsigmoid(chosen_score - rejected_score).mean()
outputs (from compute_loss, optional) tuple[torch.Tensor, list[torch.Tensor]] When return_outputs=True, returns (loss, [loss, chosen_scores, rejected_scores])
generated_predictions.jsonl (from save_predictions) File JSONL file with chosen and rejected reward scores per example

Usage Examples

# Typically instantiated by run_rm, not directly
from llamafactory.train.rm.trainer import PairwiseTrainer

trainer = PairwiseTrainer(
    model=model,
    args=training_args,
    finetuning_args=finetuning_args,
    data_collator=data_collator,
    callbacks=callbacks,
    compute_metrics=ComputeAccuracy(),
    processor=processor,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

# Training
train_result = trainer.train()

# Prediction with score saving
predict_results = trainer.predict(eval_dataset)
trainer.save_predictions(predict_results)
# Output: {"chosen": 1.23, "rejected": -0.45}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment