Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:CarperAI Trlx Reference Benchmark

From Leeroopedia


Knowledge Sources
Domains Benchmarking, Infrastructure
Last Updated 2026-02-07 16:00 GMT

Overview

Concrete tool for running benchmark comparisons between git branches with automatic W&B report generation.

Description

The reference module is a CLI tool that compares benchmark metrics between two git branches. It checks out each branch, runs scripts/benchmark.sh if not already completed (identified by a content hash tag in W&B), collects metrics from W&B runs, and generates a comparison report with line plots for key metrics (reward/mean, metric/mean displayed first, then all remaining metrics). Uses W&B's report API to create parallel comparison panels.

Usage

Use this CLI tool to generate reproducible benchmark comparisons between a feature branch and a reference branch (defaults to CarperAI/trlx:main). Requires W&B credentials and the benchmark script to be configured.

Code Reference

Source Location

Signature

# CLI script, no public class API.
# Entry point: python -m trlx.reference <branch> [--against <ref>] [--public]

# Key arguments:
#   branch: str       - Git branch in "origin:branch" format
#   --against: str    - Reference branch (default "CarperAI/trlx:main")
#   --public: flag    - Use CarperAI W&B entity

Import

# CLI usage only:
# python -m trlx.reference origin:my-feature --against CarperAI/trlx:main

I/O Contract

Inputs

Name Type Required Description
branch str (CLI positional) Yes Git branch to benchmark (format: "origin:branch")
--against str (CLI) No Reference branch (default "CarperAI/trlx:main")
--public flag (CLI) No Use CarperAI W&B entity instead of personal

Outputs

Name Type Description
W&B report URL Comparison report with metric line plots
Benchmark runs W&B runs Benchmark results tagged with content hash

Usage Examples

Compare Feature Branch Against Main

# Compare your feature branch against main
python -m trlx.reference origin:my-feature-branch

# Compare against a specific reference branch
python -m trlx.reference origin:my-feature --against origin:release-v1.0

# Use CarperAI public W&B entity
python -m trlx.reference origin:my-feature --public

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment