Implementation:CarperAI Trlx Default PPO Config
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, NLP, Configuration |
| Last Updated | 2026-02-07 16:00 GMT |
Overview
Concrete tool for creating default PPO training configurations provided by the trlx library.
Description
The default_ppo_config() factory function returns a fully populated TRLConfig object with sensible defaults for PPO-based online RL training. It assembles TrainConfig, ModelConfig, TokenizerConfig, OptimizerConfig, SchedulerConfig, and PPOConfig into a single configuration hierarchy. The returned config can be customized via TRLConfig.update() (flat dict with dot-separated keys) or TRLConfig.evolve() (nested dict merging).
Usage
Import this function when you need a quick-start configuration for PPO training. Override specific parameters using TRLConfig.update() or TRLConfig.evolve() before passing to trlx.train().
Code Reference
Source Location
- Repository: trlx
- File: trlx/data/default_configs.py
- Lines: L17-59
Signature
def default_ppo_config() -> TRLConfig:
return TRLConfig(
train=TrainConfig(
seq_length=1024,
epochs=100,
total_steps=10000,
batch_size=32,
checkpoint_interval=10000,
eval_interval=100,
pipeline="PromptPipeline",
trainer="AcceleratePPOTrainer",
),
model=ModelConfig(model_path="lvwerra/gpt2-imdb", num_layers_unfrozen=2),
tokenizer=TokenizerConfig(tokenizer_path="gpt2", truncation_side="right"),
optimizer=OptimizerConfig(
name="adamw",
kwargs=dict(lr=3e-5, betas=(0.9, 0.95), eps=1.0e-8, weight_decay=1.0e-6),
),
scheduler=SchedulerConfig(
name="cosine_annealing",
kwargs=dict(T_max=1e12, eta_min=3e-5),
),
method=PPOConfig(
name="PPOConfig",
num_rollouts=128,
chunk_size=128,
ppo_epochs=4,
init_kl_coef=0.001,
target=None,
horizon=10000,
gamma=1,
lam=0.95,
cliprange=0.2,
cliprange_value=0.2,
vf_coef=1,
scale_reward="ignored",
ref_mean=None,
ref_std=None,
cliprange_reward=10,
gen_kwargs=dict(max_new_tokens=40, top_k=0, top_p=1.0, do_sample=True),
),
)
Import
from trlx.data.default_configs import default_ppo_config
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (none) | — | — | Factory function takes no arguments |
Outputs
| Name | Type | Description |
|---|---|---|
| return | TRLConfig | Fully configured TRLConfig with PPOConfig method, ModelConfig, TrainConfig, TokenizerConfig, OptimizerConfig, SchedulerConfig |
Usage Examples
Basic PPO Config
from trlx.data.default_configs import default_ppo_config
# Get default PPO configuration
config = default_ppo_config()
# Override model and training parameters
config.model.model_path = "gpt2"
config.train.batch_size = 16
config.method.init_kl_coef = 0.05
Using TRLConfig.update()
from trlx.data.default_configs import default_ppo_config
from trlx.data.configs import TRLConfig
config = TRLConfig.update(
default_ppo_config(),
{
"model.model_path": "EleutherAI/gpt-j-6B",
"train.batch_size": 4,
"train.seq_length": 550,
"method.init_kl_coef": 0.1,
"method.num_rollouts": 128,
"method.gen_kwargs": dict(max_new_tokens=50, top_k=0, top_p=1.0, do_sample=True),
},
)