Implementation:CarperAI Trlx Default ILQL Config
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Offline_RL, Configuration |
| Last Updated | 2026-02-07 16:00 GMT |
Overview
Concrete tool for creating default ILQL training configurations provided by the trlx library.
Description
The default_ilql_config() factory function returns a fully populated TRLConfig with defaults for offline RL training using ILQL. It configures the ILQLConfig method parameters including expectile regression (tau), discount factor (gamma), CQL/AWAC loss scales, and Q-value guided generation parameters. The trainer is set to AccelerateILQLTrainer and the pipeline to PromptPipeline.
Usage
Import this function when setting up offline RL training from reward-labeled datasets. Override specific parameters using TRLConfig.update() or TRLConfig.evolve() before passing to trlx.train() with samples and rewards arguments.
Code Reference
Source Location
- Repository: trlx
- File: trlx/data/default_configs.py
- Lines: L62-94
Signature
def default_ilql_config() -> TRLConfig:
return TRLConfig(
train=TrainConfig(
seq_length=64,
batch_size=128,
epochs=100,
total_steps=1000,
checkpoint_interval=1000,
eval_interval=100,
pipeline="PromptPipeline",
trainer="AccelerateILQLTrainer",
),
model=ModelConfig(model_path="gpt2", num_layers_unfrozen=-1),
tokenizer=TokenizerConfig(tokenizer_path="gpt2", truncation_side="right"),
optimizer=OptimizerConfig(
name="adamw",
kwargs=dict(lr=5.0e-5, betas=(0.9, 0.95), eps=1.0e-8, weight_decay=1.0e-6),
),
scheduler=SchedulerConfig(
name="cosine_annealing",
kwargs=dict(T_max=1e12, eta_min=5.0e-5),
),
method=ILQLConfig(
name="ilqlconfig",
tau=0.7,
gamma=0.99,
cql_scale=0.1,
awac_scale=1,
alpha=0.001,
beta=0,
steps_for_target_q_sync=5,
two_qs=True,
gen_kwargs=dict(max_new_tokens=56, top_k=20, beta=1, temperature=1.0),
),
)
Import
from trlx.data.default_configs import default_ilql_config
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (none) | — | — | Factory function takes no arguments |
Outputs
| Name | Type | Description |
|---|---|---|
| return | TRLConfig | Fully configured TRLConfig with ILQLConfig method, trainer set to AccelerateILQLTrainer |
Usage Examples
Basic ILQL Config
from trlx.data.default_configs import default_ilql_config
config = default_ilql_config()
config.train.batch_size = 64
config.method.tau = 0.8
config.method.gamma = 0.99
With TRLConfig.evolve()
from trlx.data.default_configs import default_ilql_config
config = default_ilql_config()
config = config.evolve(
method=dict(gamma=0.99, gen_kwargs=dict(max_new_tokens=100)),
train=dict(batch_size=64, total_steps=5000),
)