Environment:Huggingface Alignment handbook Python TRL

Knowledge Sources	Alignment Handbook TRL Documentation
Domains	NLP, Deep_Learning, Training
Last Updated	2026-02-07 00:00 GMT

Overview

Python environment with TRL >= 0.19.1 providing SFTTrainer, DPOTrainer, ORPOTrainer, TrlParser, and PEFT integration utilities.

Description

TRL (Transformer Reinforcement Learning) is the core training library used by the alignment-handbook. It provides the trainer classes (SFTTrainer, DPOTrainer, ORPOTrainer), the configuration parser (TrlParser), and utility functions (get_peft_config, get_quantization_config, get_kbit_device_map, setup_chat_format). All three training scripts import directly from TRL.

Usage

Use this environment for any training script in the alignment-handbook. TRL is a mandatory dependency that provides the trainer implementations, config parsing, and LoRA/quantization utilities used by every workflow.

System Requirements

Category	Requirement	Notes
Python	>= 3.10.9	Required by the alignment-handbook package
Hardware	NVIDIA GPU	TRL trainers use PyTorch CUDA backend

Dependencies

Python Packages

`trl` >= 0.19.1
`transformers` >= 4.53.3 (peer dependency)
`accelerate` >= 1.9.0 (peer dependency)
`torch` >= 2.6.0 (peer dependency)

Credentials

No additional credentials beyond those in the PyTorch_CUDA environment.

Quick Install

# TRL is installed as part of the alignment-handbook
uv pip install .

# Or install TRL standalone
pip install trl>=0.19.1

Code Evidence

TRL version requirement from `setup.py:69`:

    "trl>=0.19.1",

TRL imports in `scripts/sft.py:48`:

from trl import ModelConfig, SFTTrainer, TrlParser, get_peft_config, setup_chat_format

TRL imports in `scripts/dpo.py:61`:

from trl import DPOTrainer, ModelConfig, TrlParser, get_peft_config

TRL imports in `scripts/orpo.py:62`:

from trl import ModelConfig, ORPOTrainer, TrlParser, get_peft_config

TRL base classes in `src/alignment/configs.py:33-34,57,134,143,152`:

import trl

class ScriptArguments(trl.ScriptArguments):
class SFTConfig(trl.SFTConfig):
class DPOConfig(trl.DPOConfig):
class ORPOConfig(trl.ORPOConfig):

Common Errors

Error Message	Cause	Solution
`ImportError: cannot import name 'SFTTrainer' from 'trl'`	TRL version too old	Upgrade: `pip install trl>=0.19.1`
`AttributeError: 'SFTConfig' has no attribute 'packing_strategy'`	TRL version missing newer features	Upgrade TRL to latest version for SmolLM3 recipes
`ImportError: cannot import name 'TrlParser' from 'trl'`	TRL version too old	Upgrade: `pip install trl>=0.19.1`

Compatibility Notes

TRL version: The alignment-handbook extends TRL's config classes (ScriptArguments, SFTConfig, DPOConfig, ORPOConfig) with additional fields. Ensure TRL version matches or exceeds the requirement.
SmolLM3 recipes: Newer features like `packing_strategy: ffd`, `padding_free: true`, `use_liger_kernel: true`, and `assistant_only_loss: true` require the latest TRL versions.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment