Environment:CarperAI Trlx Python Accelerate
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Reinforcement_Learning, NLP |
| Last Updated | 2026-02-07 16:00 GMT |
Overview
Linux environment with Python 3.9-3.11, PyTorch 2.0+ with CUDA 11.8, HuggingFace Transformers, and Accelerate for single or multi-GPU RLHF training.
Description
This environment provides the base runtime context for all Accelerate-based training in trlx. It is built on PyTorch with CUDA support and uses HuggingFace Accelerate for device management, mixed precision, and distributed training orchestration. The stack includes Transformers for model loading, Datasets for data handling, and W&B for experiment tracking. Optional packages include bitsandbytes for 8-bit optimizers and PEFT for parameter-efficient fine-tuning.
Usage
Use this environment for all trlx training workflows: PPO (online RL), ILQL (offline RL), SFT (supervised fine-tuning), and RFT (rejection fine-tuning). It is the mandatory prerequisite for running any Accelerate-based trainer, including sentiment generation, dialogue alignment, and summarization tasks.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | Tested on Ubuntu; macOS/Windows not officially supported |
| Python | 3.9 - 3.11 | CI runs on Python 3.9 |
| Hardware | NVIDIA GPU with CUDA support | CPU-only possible but impractical for RL training |
| CUDA | 11.8 | Pinned in requirements.txt as `torch==2.0.1+cu118` |
| Disk | 10GB+ | Model checkpoints and datasets require significant storage |
Dependencies
System Packages
- CUDA Toolkit 11.8
- NVIDIA drivers compatible with CUDA 11.8
Python Packages (Core)
- `torch` == 2.0.1 (with CUDA 11.8)
- `transformers` >= 4.27.1 (tested with 4.32.0)
- `accelerate` >= 0.17.1 (tested with 0.22.0)
- `datasets` (HuggingFace Datasets)
- `deepspeed` >= 0.8.1 (tested with 0.10.1)
- `einops` >= 0.4.1
- `numpy` >= 1.23.2
- `torchtyping`
- `tqdm`
- `rich`
- `wandb` >= 0.13.5
- `ray` >= 2.4.0
- `tabulate` >= 0.9.0
- `networkx`
- `cattrs` >= 22.2.0
- `attrs` >= 22.1.0
Python Packages (Optional)
- `bitsandbytes` + `scipy` — 8-bit Adam/AdamW optimizers
- `peft` >= 0.5.0 — LoRA and other parameter-efficient fine-tuning methods
- `triton` == 2.0.0 — Triton kernel acceleration
- `tritonclient` == 2.36.0 — Triton Inference Server client for reward model serving
- `evaluate` >= 0.4.0 — HuggingFace Evaluate (for ROUGE metrics)
- `nltk` >= 3.8.1 — Natural language processing utilities
- `rouge-score` >= 0.1.2 — ROUGE scoring
Credentials
The following environment variables may be required depending on workflow:
- `WANDB_API_KEY`: Weights & Biases API key for experiment tracking (used by `config.train.tracker`)
- `HF_TOKEN`: HuggingFace API token for accessing gated models
- `TRITON_HOST`: Triton Inference Server hostname (only for reward model serving via Triton)
Distributed Training Variables (Auto-set by Accelerate)
- `WORLD_SIZE`: Total number of processes (auto-set)
- `LOCAL_RANK`: Local GPU rank (auto-set)
- `RANK`: Global process rank (auto-set)
- `ACCELERATE_DEEPSPEED_ZERO_STAGE`: DeepSpeed ZeRO stage (auto-set when using DeepSpeed)
Quick Install
# Install PyTorch with CUDA 11.8
pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118
# Install trlx and all core dependencies
pip install git+https://github.com/CarperAI/trlx.git
# Optional: 8-bit optimizers
pip install bitsandbytes scipy
# Optional: Parameter-efficient fine-tuning
pip install peft>=0.5.0
# Optional: ROUGE evaluation (for summarization)
pip install evaluate>=0.4.0 nltk>=3.8.1 rouge-score>=0.1.2
Code Evidence
PEFT availability detection from `trlx/utils/__init__.py:19-20`:
def is_peft_available():
return importlib.util.find_spec("peft") is not None
8-bit optimizer import with error handling from `trlx/utils/__init__.py:104-113`:
if name == OptimizerName.ADAM_8BIT_BNB.value:
try:
from bitsandbytes.optim import Adam8bit
return Adam8bit
except ImportError:
raise ImportError(
"You must install the `bitsandbytes` package to use the 8-bit Adam. "
"Install with: `pip install bitsandbytes`"
)
NeMo optional dependency handling from `trlx/utils/loading.py:14-28`:
try:
from trlx.trainer.nemo_ilql_trainer import NeMoILQLTrainer
from trlx.trainer.nemo_ppo_trainer import NeMoPPOTrainer
from trlx.trainer.nemo_sft_trainer import NeMoSFTTrainer
except ImportError:
def _trainers_unavailble(names: List[str]):
def log_error(*args, **kwargs):
raise ImportError("NeMo is not installed.")
for name in names:
register_trainer(name)(log_error)
8-bit loading explicitly unsupported from `trlx/models/modeling_base.py:72-77`:
self.is_loaded_in_8bit = getattr(base_model, "is_loaded_in_8bit", False)
if self.is_loaded_in_8bit:
raise NotImplementedError(
"`is_loaded_in_8bit` is an experimental feature not yet fully supported."
)
CUDA device detection from `examples/ppo_sentiments.py:25`:
if torch.cuda.is_available():
device = int(os.environ.get("LOCAL_RANK", 0))
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ImportError: You must install the bitsandbytes package` | bitsandbytes not installed when using 8-bit optimizer | `pip install bitsandbytes scipy` |
| `ImportError: NeMo is not installed` | Attempting to use NeMo-based trainers without nemo_toolkit | Install NVIDIA NeMo Toolkit (separate environment recommended) |
| `NotImplementedError: is_loaded_in_8bit is an experimental feature` | Loading model with `load_in_8bit=True` | Do not use 8-bit model loading; use 8-bit optimizers instead |
| `AssertionError: Minibatch size must divide batch size` | `batch_size % minibatch_size != 0` | Ensure batch_size is evenly divisible by minibatch_size |
| `CUDA out of memory` | Model or batch too large for GPU VRAM | Reduce batch_size, use gradient accumulation, enable DeepSpeed ZeRO, or use PEFT/LoRA |
Compatibility Notes
- NeMo trainers: Require a separate NVIDIA NeMo environment (not covered by this page). They are registered as dummy trainers that raise ImportError if NeMo is missing.
- 8-bit model loading: Explicitly unsupported (`NotImplementedError`). Use 8-bit optimizers via bitsandbytes instead.
- bitsandbytes embeddings: When using 8-bit optimizers, embedding weights are forced to 32-bit for numerical stability (see `accelerate_base_trainer.py:183-191`).
- Tokenizer padding: A custom `<|padding|>` token is added if the tokenizer has no pad_token defined.
- Mixed precision: Both fp16 and bf16 supported via Accelerate configuration.
Related Pages
- Implementation:CarperAI_Trlx_Default_PPO_Config
- Implementation:CarperAI_Trlx_Default_ILQL_Config
- Implementation:CarperAI_Trlx_Default_SFT_Config
- Implementation:CarperAI_Trlx_PromptPipeline
- Implementation:CarperAI_Trlx_Trlx_Train_Online
- Implementation:CarperAI_Trlx_Trlx_Train_Offline
- Implementation:CarperAI_Trlx_Trlx_Train_SFT
- Implementation:CarperAI_Trlx_Save_Pretrained
- Implementation:CarperAI_Trlx_ILQL_Generate
- Implementation:CarperAI_Trlx_Reward_Function_Interface
- Implementation:CarperAI_Trlx_Metric_Function_Interface
- Implementation:CarperAI_Trlx_ROUGE_Metric_Evaluation
- Implementation:CarperAI_Trlx_DSL_Interpreter
- Implementation:CarperAI_Trlx_Sweep
- Implementation:CarperAI_Trlx_Logging
- Implementation:CarperAI_Trlx_Random_Walk_Environment
- Implementation:CarperAI_Trlx_Accelerate_Base_Datatypes
- Implementation:CarperAI_Trlx_Reference_Benchmark
- Implementation:CarperAI_Trlx_Accelerate_RFT_Trainer