Principle:CarperAI Trlx Evaluation Metrics Design

Knowledge Sources	Training language models to follow instructions with human feedback CarperAI trlx
Domains	Evaluation, NLP, Reinforcement_Learning
Last Updated	2026-02-07 16:00 GMT

Overview

A design principle for creating evaluation metric functions that monitor language model quality during RL or SFT training.

Description

During training, periodic evaluation provides insight into model behavior beyond the training loss or reward signal. Evaluation metric functions generate text from held-out prompts and compute quality metrics on the generated outputs. Unlike reward functions (which drive optimization), metric functions are observational — they log statistics for monitoring without influencing gradient updates.

In trlx, the metric function is called during evaluation intervals on batches of generated text. It returns a dictionary mapping metric names to per-sample scores, which are then logged to trackers (Weights & Biases, TensorBoard). This allows tracking multiple dimensions of quality simultaneously (e.g., sentiment, fluency, diversity).

Usage

Design a metric function when you need to monitor generation quality during training beyond the primary reward signal. Pass it as the metric_fn argument to trlx.train(). Metric functions are particularly important for offline training (ILQL, SFT) where there is no live reward function, and for detecting reward hacking in PPO.

Theoretical Basis

Evaluation metrics serve as a multi-dimensional assessment:

$M : 𝒴^{n} \to {m_{1} : ℝ^{n}, m_{2} : ℝ^{n}, \dots}$

Where each metric $m_{i}$ measures a different quality dimension.

Design principles:

Independence from reward: Metrics should measure aspects not captured by the reward signal
Interpretability: Each metric should have a clear meaning (e.g., "sentiment score", "ROUGE-L")
Per-sample granularity: Return one value per sample for detailed analysis
Efficiency: Called periodically, so batch processing is preferred

Related Pages

Implemented By

Implementation:CarperAI_Trlx_Metric_Function_Interface

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment