Principle:CarperAI Trlx RL Data Structures

Knowledge Sources	CarperAI_Trlx
Domains	Data_Structures, Reinforcement_Learning
Last Updated	2026-02-07 16:00 GMT

Overview

Design pattern for defining typed data containers that standardize the interface between RL data pipelines and trainers.

Description

RL training pipelines involve passing structured data (prompts, generated tokens, rewards, log-probabilities, values) between components: data pipelines produce prompt batches, models produce completions with logprobs, and trainers consume the combined rollout data. Typed dataclasses ensure consistent field names, tensor shapes, and types across these boundaries, preventing runtime errors from mismatched data structures. The single-element and batch-element distinction supports both per-sample processing and efficient batched operations.

Usage

Use this principle when designing the data flow between RL training components. Define dataclasses for each data interchange point (prompts, rollout elements, training batches) with explicit type annotations for tensor shapes.

Theoretical Basis

The pattern follows the Data Transfer Object design pattern:

Single Element: Represents one data point (one prompt, one rollout).
Batch Element: Represents a batch of data points with an additional batch dimension.
Type Safety: TensorType annotations document expected shapes at the type level.
Immutability: Dataclass fields are fixed, preventing accidental mutation.

Pseudo-code Logic:

# Abstract pattern (NOT real implementation)
@dataclass
class RLElement:
    tokens: Tensor["seq_len"]
    rewards: Tensor["seq_len"]

@dataclass
class RLBatch:
    tokens: Tensor["batch", "seq_len"]
    rewards: Tensor["batch", "seq_len"]

Related Pages

Implementation:CarperAI_Trlx_Accelerate_Base_Datatypes

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment