Principle:CarperAI Trlx Prompt Preparation

Knowledge Sources	Training language models to follow instructions with human feedback CarperAI trlx
Domains	Data_Pipeline, NLP, Tokenization
Last Updated	2026-02-07 16:00 GMT

Overview

A data pipeline principle for tokenizing and batching text prompts to serve as inputs for language model generation during RL training.

Description

In online RL training, the language model needs a stream of prompts to generate completions from. The prompt preparation pipeline converts raw text prompts into tokenized, padded batches suitable for model input. This involves truncation to a maximum prompt length (computed as seq_length - max_new_tokens), attention mask creation, and optional metadata passthrough for reward functions.

The pipeline must handle two input formats: simple string lists and dictionary lists with additional metadata (e.g., reference outputs for delta reward computation). It produces a DataLoader that yields batches of tokenized prompts during training.

Usage

Use prompt preparation when setting up any trlx training that requires generating text from prompts: PPO training with reward_fn, or evaluation prompt pipelines for ILQL/SFT. Prompt preparation is handled automatically by trlx.train() but can be customized by understanding the PromptPipeline interface.

Theoretical Basis

Prompt preparation transforms raw text into model-consumable format:

Pseudo-code:

# Abstract algorithm (not real implementation)
max_prompt_length = seq_length - max_new_tokens
tokenized = tokenizer(prompts, truncation=True, max_length=max_prompt_length)
batched = DataLoader(tokenized, batch_size=batch_size, collate_fn=pad_collate)

Key considerations:

Truncation: Prompts exceeding max length are truncated (right-side by default)
Padding: Shorter prompts are left-padded in batches for efficient generation
Metadata passthrough: Dict-format prompts carry extra keys to the reward function
Prompt budget: max_prompt_length = seq_length - max_new_tokens ensures room for generation

Related Pages

Implemented By

Implementation:CarperAI_Trlx_PromptPipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment