Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:NVIDIA NeMo Aligner RM Get Loss And Metrics

From Leeroopedia


Implementation Details
Name RM_Get_Loss_And_Metrics
Type API Doc
Implements Reward_Model_Validation
Repository NeMo Aligner
Primary File nemo_aligner/models/nlp/gpt/megatron_gpt_reward_model.py
Domains NLP, Evaluation
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for computing reward model validation metrics including ranking loss and accuracy provided by the MegatronGPTRewardModel class.

Description

The get_loss_and_metrics method of MegatronGPTRewardModel runs the forward-backward function on preference pairs, gathers chosen and rejected rewards across distributed ranks, computes the Bradley-Terry ranking loss, and returns comprehensive validation metrics. The method handles pipeline parallelism by broadcasting results from the last pipeline stage. Metrics include loss, ranking accuracy, mean chosen/rejected rewards, and reward distribution statistics.

Usage

Called by SupervisedTrainer during validation steps with forward_only=True. Also called during training steps with forward_only=False for backpropagation.

Code Reference

Source Location

  • Repository: NeMo Aligner
  • File: nemo_aligner/models/nlp/gpt/megatron_gpt_reward_model.py
  • Lines: L249-322

Signature

class MegatronGPTRewardModel(MegatronGPTModel, SupervisedInterface, Inferrable):
    def get_loss_and_metrics(
        self,
        batch: dict,
        forward_only: bool = True,
    ) -> Tuple[torch.Tensor, Dict[str, float]]:
        """Compute reward model loss and validation metrics.

        Args:
            batch: Dict with 'chosen', 'rejected', 'chosen_length', 'rejected_length'
            forward_only: If True, no gradient computation (validation mode)

        Returns:
            loss_mean: Scalar loss averaged across microbatches
            metrics: Dict with 'loss', 'acc', 'rewards_chosen_mean',
                    'rewards_rejected_mean', 'rewards_all_mean', 'rewards_all_std'
        """

Import

from nemo_aligner.models.nlp.gpt.megatron_gpt_reward_model import MegatronGPTRewardModel

I/O Contract

Inputs

Name Type Required Description
batch dict Yes Contains chosen, rejected (token tensors), chosen_length, rejected_length
forward_only bool No Validation mode (no gradients) when True. Default True

Outputs

Name Type Description
loss_mean torch.Tensor Scalar ranking loss
metrics Dict[str, float] loss, acc, rewards_chosen_mean, rewards_rejected_mean, rewards_all_mean, rewards_all_std

Usage Examples

# Called internally by SupervisedTrainer.validation_step
model.prepare_for_validation_step()
loss_mean, metrics = model.get_loss_and_metrics(batch=val_batch, forward_only=True)
model.finish_validation_step()

# metrics contains:
# {"loss": 0.35, "acc": 0.72, "rewards_chosen_mean": 1.2,
#  "rewards_rejected_mean": -0.3, "rewards_all_mean": 0.45, "rewards_all_std": 0.8}

Related Pages

Knowledge Sources

NLP | Evaluation

2026-02-07 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment