Principle:CarperAI Trlx Prompt Preparation
| Knowledge Sources | |
|---|---|
| Domains | Data_Pipeline, NLP, Tokenization |
| Last Updated | 2026-02-07 16:00 GMT |
Overview
A data pipeline principle for tokenizing and batching text prompts to serve as inputs for language model generation during RL training.
Description
In online RL training, the language model needs a stream of prompts to generate completions from. The prompt preparation pipeline converts raw text prompts into tokenized, padded batches suitable for model input. This involves truncation to a maximum prompt length (computed as seq_length - max_new_tokens), attention mask creation, and optional metadata passthrough for reward functions.
The pipeline must handle two input formats: simple string lists and dictionary lists with additional metadata (e.g., reference outputs for delta reward computation). It produces a DataLoader that yields batches of tokenized prompts during training.
Usage
Use prompt preparation when setting up any trlx training that requires generating text from prompts: PPO training with reward_fn, or evaluation prompt pipelines for ILQL/SFT. Prompt preparation is handled automatically by trlx.train() but can be customized by understanding the PromptPipeline interface.
Theoretical Basis
Prompt preparation transforms raw text into model-consumable format:
Pseudo-code:
# Abstract algorithm (not real implementation)
max_prompt_length = seq_length - max_new_tokens
tokenized = tokenizer(prompts, truncation=True, max_length=max_prompt_length)
batched = DataLoader(tokenized, batch_size=batch_size, collate_fn=pad_collate)
Key considerations:
- Truncation: Prompts exceeding max length are truncated (right-side by default)
- Padding: Shorter prompts are left-padded in batches for efficient generation
- Metadata passthrough: Dict-format prompts carry extra keys to the reward function
- Prompt budget: max_prompt_length = seq_length - max_new_tokens ensures room for generation