Environment:Huggingface Alignment handbook Python TRL
| Knowledge Sources | |
|---|---|
| Domains | NLP, Deep_Learning, Training |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Python environment with TRL >= 0.19.1 providing SFTTrainer, DPOTrainer, ORPOTrainer, TrlParser, and PEFT integration utilities.
Description
TRL (Transformer Reinforcement Learning) is the core training library used by the alignment-handbook. It provides the trainer classes (SFTTrainer, DPOTrainer, ORPOTrainer), the configuration parser (TrlParser), and utility functions (get_peft_config, get_quantization_config, get_kbit_device_map, setup_chat_format). All three training scripts import directly from TRL.
Usage
Use this environment for any training script in the alignment-handbook. TRL is a mandatory dependency that provides the trainer implementations, config parsing, and LoRA/quantization utilities used by every workflow.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Python | >= 3.10.9 | Required by the alignment-handbook package |
| Hardware | NVIDIA GPU | TRL trainers use PyTorch CUDA backend |
Dependencies
Python Packages
- `trl` >= 0.19.1
- `transformers` >= 4.53.3 (peer dependency)
- `accelerate` >= 1.9.0 (peer dependency)
- `torch` >= 2.6.0 (peer dependency)
Credentials
No additional credentials beyond those in the PyTorch_CUDA environment.
Quick Install
# TRL is installed as part of the alignment-handbook
uv pip install .
# Or install TRL standalone
pip install trl>=0.19.1
Code Evidence
TRL version requirement from `setup.py:69`:
"trl>=0.19.1",
TRL imports in `scripts/sft.py:48`:
from trl import ModelConfig, SFTTrainer, TrlParser, get_peft_config, setup_chat_format
TRL imports in `scripts/dpo.py:61`:
from trl import DPOTrainer, ModelConfig, TrlParser, get_peft_config
TRL imports in `scripts/orpo.py:62`:
from trl import ModelConfig, ORPOTrainer, TrlParser, get_peft_config
TRL base classes in `src/alignment/configs.py:33-34,57,134,143,152`:
import trl
class ScriptArguments(trl.ScriptArguments):
class SFTConfig(trl.SFTConfig):
class DPOConfig(trl.DPOConfig):
class ORPOConfig(trl.ORPOConfig):
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ImportError: cannot import name 'SFTTrainer' from 'trl'` | TRL version too old | Upgrade: `pip install trl>=0.19.1` |
| `AttributeError: 'SFTConfig' has no attribute 'packing_strategy'` | TRL version missing newer features | Upgrade TRL to latest version for SmolLM3 recipes |
| `ImportError: cannot import name 'TrlParser' from 'trl'` | TRL version too old | Upgrade: `pip install trl>=0.19.1` |
Compatibility Notes
- TRL version: The alignment-handbook extends TRL's config classes (ScriptArguments, SFTConfig, DPOConfig, ORPOConfig) with additional fields. Ensure TRL version matches or exceeds the requirement.
- SmolLM3 recipes: Newer features like `packing_strategy: ffd`, `padding_free: true`, `use_liger_kernel: true`, and `assistant_only_loss: true` require the latest TRL versions.