Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Huggingface Alignment handbook Python TRL

From Leeroopedia
Revision as of 18:45, 16 February 2026 by Admin (talk | contribs) (Auto-imported from environments/Huggingface_Alignment_handbook_Python_TRL.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains NLP, Deep_Learning, Training
Last Updated 2026-02-07 00:00 GMT

Overview

Python environment with TRL >= 0.19.1 providing SFTTrainer, DPOTrainer, ORPOTrainer, TrlParser, and PEFT integration utilities.

Description

TRL (Transformer Reinforcement Learning) is the core training library used by the alignment-handbook. It provides the trainer classes (SFTTrainer, DPOTrainer, ORPOTrainer), the configuration parser (TrlParser), and utility functions (get_peft_config, get_quantization_config, get_kbit_device_map, setup_chat_format). All three training scripts import directly from TRL.

Usage

Use this environment for any training script in the alignment-handbook. TRL is a mandatory dependency that provides the trainer implementations, config parsing, and LoRA/quantization utilities used by every workflow.

System Requirements

Category Requirement Notes
Python >= 3.10.9 Required by the alignment-handbook package
Hardware NVIDIA GPU TRL trainers use PyTorch CUDA backend

Dependencies

Python Packages

  • `trl` >= 0.19.1
  • `transformers` >= 4.53.3 (peer dependency)
  • `accelerate` >= 1.9.0 (peer dependency)
  • `torch` >= 2.6.0 (peer dependency)

Credentials

No additional credentials beyond those in the PyTorch_CUDA environment.

Quick Install

# TRL is installed as part of the alignment-handbook
uv pip install .

# Or install TRL standalone
pip install trl>=0.19.1

Code Evidence

TRL version requirement from `setup.py:69`:

    "trl>=0.19.1",

TRL imports in `scripts/sft.py:48`:

from trl import ModelConfig, SFTTrainer, TrlParser, get_peft_config, setup_chat_format

TRL imports in `scripts/dpo.py:61`:

from trl import DPOTrainer, ModelConfig, TrlParser, get_peft_config

TRL imports in `scripts/orpo.py:62`:

from trl import ModelConfig, ORPOTrainer, TrlParser, get_peft_config

TRL base classes in `src/alignment/configs.py:33-34,57,134,143,152`:

import trl

class ScriptArguments(trl.ScriptArguments):
class SFTConfig(trl.SFTConfig):
class DPOConfig(trl.DPOConfig):
class ORPOConfig(trl.ORPOConfig):

Common Errors

Error Message Cause Solution
`ImportError: cannot import name 'SFTTrainer' from 'trl'` TRL version too old Upgrade: `pip install trl>=0.19.1`
`AttributeError: 'SFTConfig' has no attribute 'packing_strategy'` TRL version missing newer features Upgrade TRL to latest version for SmolLM3 recipes
`ImportError: cannot import name 'TrlParser' from 'trl'` TRL version too old Upgrade: `pip install trl>=0.19.1`

Compatibility Notes

  • TRL version: The alignment-handbook extends TRL's config classes (ScriptArguments, SFTConfig, DPOConfig, ORPOConfig) with additional fields. Ensure TRL version matches or exceeds the requirement.
  • SmolLM3 recipes: Newer features like `packing_strategy: ffd`, `padding_free: true`, `use_liger_kernel: true`, and `assistant_only_loss: true` require the latest TRL versions.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment