Environment:CarperAI Trlx Python Accelerate

Knowledge Sources	CarperAI/trlx trlx Installation Guide HuggingFace Accelerate
Domains	Infrastructure, Reinforcement_Learning, NLP
Last Updated	2026-02-07 16:00 GMT

Overview

Linux environment with Python 3.9-3.11, PyTorch 2.0+ with CUDA 11.8, HuggingFace Transformers, and Accelerate for single or multi-GPU RLHF training.

Description

This environment provides the base runtime context for all Accelerate-based training in trlx. It is built on PyTorch with CUDA support and uses HuggingFace Accelerate for device management, mixed precision, and distributed training orchestration. The stack includes Transformers for model loading, Datasets for data handling, and W&B for experiment tracking. Optional packages include bitsandbytes for 8-bit optimizers and PEFT for parameter-efficient fine-tuning.

Usage

Use this environment for all trlx training workflows: PPO (online RL), ILQL (offline RL), SFT (supervised fine-tuning), and RFT (rejection fine-tuning). It is the mandatory prerequisite for running any Accelerate-based trainer, including sentiment generation, dialogue alignment, and summarization tasks.

System Requirements

Category	Requirement	Notes
OS	Linux	Tested on Ubuntu; macOS/Windows not officially supported
Python	3.9 - 3.11	CI runs on Python 3.9
Hardware	NVIDIA GPU with CUDA support	CPU-only possible but impractical for RL training
CUDA	11.8	Pinned in requirements.txt as `torch==2.0.1+cu118`
Disk	10GB+	Model checkpoints and datasets require significant storage

Dependencies

System Packages

CUDA Toolkit 11.8
NVIDIA drivers compatible with CUDA 11.8

Python Packages (Core)

`torch` == 2.0.1 (with CUDA 11.8)
`transformers` >= 4.27.1 (tested with 4.32.0)
`accelerate` >= 0.17.1 (tested with 0.22.0)
`datasets` (HuggingFace Datasets)
`deepspeed` >= 0.8.1 (tested with 0.10.1)
`einops` >= 0.4.1
`numpy` >= 1.23.2
`torchtyping`
`tqdm`
`rich`
`wandb` >= 0.13.5
`ray` >= 2.4.0
`tabulate` >= 0.9.0
`networkx`
`cattrs` >= 22.2.0
`attrs` >= 22.1.0

Python Packages (Optional)

`bitsandbytes` + `scipy` — 8-bit Adam/AdamW optimizers
`peft` >= 0.5.0 — LoRA and other parameter-efficient fine-tuning methods
`triton` == 2.0.0 — Triton kernel acceleration
`tritonclient` == 2.36.0 — Triton Inference Server client for reward model serving
`evaluate` >= 0.4.0 — HuggingFace Evaluate (for ROUGE metrics)
`nltk` >= 3.8.1 — Natural language processing utilities
`rouge-score` >= 0.1.2 — ROUGE scoring

Credentials

The following environment variables may be required depending on workflow:

`WANDB_API_KEY`: Weights & Biases API key for experiment tracking (used by `config.train.tracker`)
`HF_TOKEN`: HuggingFace API token for accessing gated models
`TRITON_HOST`: Triton Inference Server hostname (only for reward model serving via Triton)

Distributed Training Variables (Auto-set by Accelerate)

`WORLD_SIZE`: Total number of processes (auto-set)
`LOCAL_RANK`: Local GPU rank (auto-set)
`RANK`: Global process rank (auto-set)
`ACCELERATE_DEEPSPEED_ZERO_STAGE`: DeepSpeed ZeRO stage (auto-set when using DeepSpeed)

Quick Install

# Install PyTorch with CUDA 11.8
pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118

# Install trlx and all core dependencies
pip install git+https://github.com/CarperAI/trlx.git

# Optional: 8-bit optimizers
pip install bitsandbytes scipy

# Optional: Parameter-efficient fine-tuning
pip install peft>=0.5.0

# Optional: ROUGE evaluation (for summarization)
pip install evaluate>=0.4.0 nltk>=3.8.1 rouge-score>=0.1.2

Code Evidence

PEFT availability detection from `trlx/utils/__init__.py:19-20`:

def is_peft_available():
    return importlib.util.find_spec("peft") is not None

8-bit optimizer import with error handling from `trlx/utils/__init__.py:104-113`:

if name == OptimizerName.ADAM_8BIT_BNB.value:
    try:
        from bitsandbytes.optim import Adam8bit
        return Adam8bit
    except ImportError:
        raise ImportError(
            "You must install the `bitsandbytes` package to use the 8-bit Adam. "
            "Install with: `pip install bitsandbytes`"
        )

NeMo optional dependency handling from `trlx/utils/loading.py:14-28`:

try:
    from trlx.trainer.nemo_ilql_trainer import NeMoILQLTrainer
    from trlx.trainer.nemo_ppo_trainer import NeMoPPOTrainer
    from trlx.trainer.nemo_sft_trainer import NeMoSFTTrainer
except ImportError:
    def _trainers_unavailble(names: List[str]):
        def log_error(*args, **kwargs):
            raise ImportError("NeMo is not installed.")
        for name in names:
            register_trainer(name)(log_error)

8-bit loading explicitly unsupported from `trlx/models/modeling_base.py:72-77`:

self.is_loaded_in_8bit = getattr(base_model, "is_loaded_in_8bit", False)
if self.is_loaded_in_8bit:
    raise NotImplementedError(
        "`is_loaded_in_8bit` is an experimental feature not yet fully supported."
    )

CUDA device detection from `examples/ppo_sentiments.py:25`:

if torch.cuda.is_available():
    device = int(os.environ.get("LOCAL_RANK", 0))

Common Errors

Error Message	Cause	Solution
`ImportError: You must install the bitsandbytes package`	bitsandbytes not installed when using 8-bit optimizer	`pip install bitsandbytes scipy`
`ImportError: NeMo is not installed`	Attempting to use NeMo-based trainers without nemo_toolkit	Install NVIDIA NeMo Toolkit (separate environment recommended)
`NotImplementedError: is_loaded_in_8bit is an experimental feature`	Loading model with `load_in_8bit=True`	Do not use 8-bit model loading; use 8-bit optimizers instead
`AssertionError: Minibatch size must divide batch size`	`batch_size % minibatch_size != 0`	Ensure batch_size is evenly divisible by minibatch_size
`CUDA out of memory`	Model or batch too large for GPU VRAM	Reduce batch_size, use gradient accumulation, enable DeepSpeed ZeRO, or use PEFT/LoRA

Compatibility Notes

NeMo trainers: Require a separate NVIDIA NeMo environment (not covered by this page). They are registered as dummy trainers that raise ImportError if NeMo is missing.
8-bit model loading: Explicitly unsupported (`NotImplementedError`). Use 8-bit optimizers via bitsandbytes instead.
bitsandbytes embeddings: When using 8-bit optimizers, embedding weights are forced to 32-bit for numerical stability (see `accelerate_base_trainer.py:183-191`).
Tokenizer padding: A custom `<|padding|>` token is added if the tokenizer has no pad_token defined.
Mixed precision: Both fp16 and bf16 supported via Accelerate configuration.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment