Environment:LLMBook zh LLMBook zh github io HuggingFace Transformers Stack

Knowledge Sources	LLMBook-zh Transformers PEFT TRL
Domains	Infrastructure, NLP, LLMs
Last Updated	2026-02-08 04:30 GMT

Overview

Hugging Face ecosystem environment including transformers, peft, trl, and datasets libraries for LLM training, fine-tuning, and alignment.

Description

This environment provides the Hugging Face software stack used across all training and fine-tuning workflows. The transformers library supplies AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments, and HfArgumentParser used in pre-training, SFT, LoRA, and DPO scripts. The peft library provides LoraConfig and get_peft_model for parameter-efficient fine-tuning. The trl library provides DPOTrainer for preference alignment. The datasets library handles data loading via load_dataset. FlashAttention 2 integration is used via attn_implementation="flash_attention_2".

Usage

Use this environment for all model loading, training, fine-tuning, and alignment workflows. Required by every script that loads models with AutoModelForCausalLM.from_pretrained() or trains with Trainer/DPOTrainer.

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu recommended)	Full support for all HF libraries
Hardware	NVIDIA GPU	Required for FlashAttention 2 (Ampere+ architecture)
Python	Python >= 3.8	Required by transformers
Disk	30GB+	For cached model weights from Hugging Face Hub

Dependencies

Python Packages

`transformers` >= 4.30
`peft` >= 0.4.0
`trl` >= 0.5.0
`datasets` >= 2.14
`accelerate` >= 0.20
`flash-attn` >= 2.0 (for FlashAttention 2 support)
`deepspeed` >= 0.9 (optional, for distributed training)

Credentials

The following environment variables may be needed:

`HF_TOKEN`: Hugging Face API token for accessing gated models (e.g., LLaMA-2 requires access approval).

Quick Install

# Install the full Hugging Face stack
pip install transformers peft trl datasets accelerate

# For FlashAttention 2 support (requires CUDA)
pip install flash-attn --no-build-isolation

# Optional: DeepSpeed for distributed training
pip install deepspeed

Code Evidence

Transformers imports from `code/6.2 预训练实践.py:3-9`:

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    HfArgumentParser,
    TrainingArguments,
    Trainer,
)

FlashAttention 2 usage from `code/6.2 预训练实践.py:55`:

model = AutoModelForCausalLM.from_pretrained(
    args.model_name_or_path, attn_implementation="flash_attention_2"
)

PEFT imports from `code/7.4 LoRA实践.py:3-8`:

from peft import (
    LoraConfig,
    TaskType,
    AutoPeftModelForCausalLM,
    get_peft_model,
)

DeepSpeed integration from `code/7.4 LoRA实践.py:9-12`:

from transformers.integrations.deepspeed import (
    is_deepspeed_zero3_enabled,
    unset_hf_deepspeed_config,
)

TRL DPOTrainer from `code/8.2 DPO实践.py:5`:

from trl import DPOTrainer

Datasets library from `code/6.3 预训练数据类.py:2`:

from datasets import load_dataset

Common Errors

Error Message	Cause	Solution
`OSError: meta-llama/Llama-2-7b-hf is gated`	Model requires access approval	Accept license on HuggingFace Hub and set `HF_TOKEN`
`ImportError: flash_attn not found`	FlashAttention not installed	`pip install flash-attn --no-build-isolation`
`ImportError: peft not found`	PEFT library not installed	`pip install peft`
`ImportError: trl not found`	TRL library not installed	`pip install trl`

Compatibility Notes

FlashAttention 2: Only supported on Ampere (A100/A10) and newer NVIDIA GPUs. Falls back to standard attention on older hardware.
DeepSpeed Zero-3: When merging LoRA adapters after Zero-3 training, must call `unset_hf_deepspeed_config()` first (see `code/7.4 LoRA实践.py:47`).
GPTQConfig: Requires `auto-gptq` package additionally for GPTQ quantization workflow.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment