Environment:Hiyouga LLaMA Factory Core Python GPU Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning |
| Last Updated | 2026-02-06 20:00 GMT |
Overview
Python 3.11+ environment with PyTorch 2.4+, Transformers 4.51+, and GPU acceleration (CUDA/NPU/XPU/MPS) for LLM fine-tuning and inference.
Description
This environment provides the core runtime for LLaMA Factory. It requires Python 3.11 or higher and a modern PyTorch installation (2.4+). The framework supports multiple accelerator backends: NVIDIA CUDA, Huawei Ascend NPU, Intel XPU, and Apple MPS. Device selection is automatic via transformers.utils detection functions. The software stack includes HuggingFace Transformers, Datasets, Accelerate, PEFT, and TRL as mandatory dependencies with strict version pinning.
Usage
Use this environment for all LLaMA Factory operations: training (SFT, DPO, KTO, PPO, RM, PT), inference (CLI, API, WebUI), evaluation, and model export. This is the mandatory base prerequisite for every workflow in the repository.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu recommended) | Windows not officially supported; macOS via MPS |
| Python | >= 3.11.0 | Supports 3.11, 3.12, 3.13 |
| Hardware | GPU with >= 8GB VRAM | NVIDIA (CUDA), Huawei Ascend (NPU), Intel (XPU), or Apple (MPS) |
| Disk | >= 20GB SSD | For model weights and dataset caching |
Dependencies
Core Packages
torch>= 2.4.0torchvision>= 0.19.0torchaudio>= 2.4.0transformers>= 4.51.0, <= 5.0.0 (excluding 4.52.0 and 4.57.0)datasets>= 2.16.0, <= 4.0.0accelerate>= 1.3.0, <= 1.11.0peft>= 0.18.0, <= 0.18.1trl>= 0.18.0, <= 0.24.0torchdata>= 0.10.0, <= 0.11.0
GUI and Visualization
gradio>= 4.38.0, <= 5.50.0matplotlib>= 3.7.0tyro< 0.9.0
Operations
einopsnumpypandasscipy
Model and Tokenizer
sentencepiecetiktokenmodelscopehf-transfersafetensors
API Server
uvicornfastapisse-starlette
Python Utilities
av>= 10.0.0, <= 16.0.0fireomegaconfpackagingprotobufpyyamlpydantic
Credentials
The following environment variables may be required depending on usage:
HF_TOKEN: HuggingFace API token for accessing gated models and datasets.USE_MODELSCOPE_HUB: Set to1to download models from ModelScope instead of HuggingFace.USE_OPENMIND_HUB: Set to1to download models from OpenMind Hub.LLAMAFACTORY_VERBOSITY: Logging verbosity level (default:INFO).DISABLE_VERSION_CHECK: Set to1to skip dependency version validation.ALLOW_EXTRA_ARGS: Set to1to allow unrecognized CLI arguments.
Quick Install
pip install llamafactory
# Or install from source:
git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e .
Code Evidence
Dependency version checking from src/llamafactory/extras/misc.py:95-101:
def check_dependencies() -> None:
r"""Check the version of the required packages."""
check_version("transformers>=4.51.0,<=5.0.0")
check_version("datasets>=2.16.0,<=4.0.0")
check_version("accelerate>=1.3.0,<=1.11.0")
check_version("peft>=0.18.0,<=0.18.1")
check_version("trl>=0.18.0,<=0.24.0")
Device auto-detection from src/llamafactory/extras/misc.py:144-157:
def get_current_device() -> "torch.device":
r"""Get the current available device."""
if is_torch_xpu_available():
device = "xpu:{}".format(os.getenv("LOCAL_RANK", "0"))
elif is_torch_npu_available():
device = "npu:{}".format(os.getenv("LOCAL_RANK", "0"))
elif is_torch_mps_available():
device = "mps:{}".format(os.getenv("LOCAL_RANK", "0"))
elif is_torch_cuda_available():
device = "cuda:{}".format(os.getenv("LOCAL_RANK", "0"))
else:
device = "cpu"
return torch.device(device)
Python version requirement from pyproject.toml:11:
requires-python = ">=3.11.0"
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
transformers>=4.51.0,<=5.0.0 is required |
Wrong transformers version | pip install transformers>=4.51.0,<=5.0.0
|
Please launch distributed training with llamafactory-cli or torchrun |
Single-process mode without KTransformers | Use llamafactory-cli train instead of direct python
|
Please use FORCE_TORCHRUN=1 to launch DeepSpeed training |
DeepSpeed without torchrun | Set FORCE_TORCHRUN=1 environment variable
|
Please specify max_steps in streaming mode |
Streaming dataset without max_steps | Add max_steps to training config
|
Compatibility Notes
- NVIDIA CUDA: Primary supported platform. Supports fp16 and bf16 mixed precision.
- Huawei Ascend NPU: Supported with
torch_npu. JIT compilation is disabled by default (setNPU_JIT_COMPILE=1to enable). Usesspawnmultiprocessing method for vLLM workers. - Intel XPU: Supported via
torch.xpu. - Apple MPS: Basic support. Bf16 availability depends on hardware.
- CPU: Fallback when no accelerator is detected. No mixed precision support.