Implementation:CarperAI Trlx Reference Benchmark
| Knowledge Sources | |
|---|---|
| Domains | Benchmarking, Infrastructure |
| Last Updated | 2026-02-07 16:00 GMT |
Overview
Concrete tool for running benchmark comparisons between git branches with automatic W&B report generation.
Description
The reference module is a CLI tool that compares benchmark metrics between two git branches. It checks out each branch, runs scripts/benchmark.sh if not already completed (identified by a content hash tag in W&B), collects metrics from W&B runs, and generates a comparison report with line plots for key metrics (reward/mean, metric/mean displayed first, then all remaining metrics). Uses W&B's report API to create parallel comparison panels.
Usage
Use this CLI tool to generate reproducible benchmark comparisons between a feature branch and a reference branch (defaults to CarperAI/trlx:main). Requires W&B credentials and the benchmark script to be configured.
Code Reference
Source Location
- Repository: CarperAI_Trlx
- File: trlx/reference.py
- Lines: 1-103
Signature
# CLI script, no public class API.
# Entry point: python -m trlx.reference <branch> [--against <ref>] [--public]
# Key arguments:
# branch: str - Git branch in "origin:branch" format
# --against: str - Reference branch (default "CarperAI/trlx:main")
# --public: flag - Use CarperAI W&B entity
Import
# CLI usage only:
# python -m trlx.reference origin:my-feature --against CarperAI/trlx:main
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| branch | str (CLI positional) | Yes | Git branch to benchmark (format: "origin:branch") |
| --against | str (CLI) | No | Reference branch (default "CarperAI/trlx:main") |
| --public | flag (CLI) | No | Use CarperAI W&B entity instead of personal |
Outputs
| Name | Type | Description |
|---|---|---|
| W&B report | URL | Comparison report with metric line plots |
| Benchmark runs | W&B runs | Benchmark results tagged with content hash |
Usage Examples
Compare Feature Branch Against Main
# Compare your feature branch against main
python -m trlx.reference origin:my-feature-branch
# Compare against a specific reference branch
python -m trlx.reference origin:my-feature --against origin:release-v1.0
# Use CarperAI public W&B entity
python -m trlx.reference origin:my-feature --public