Environment:Huggingface Diffusers PyTorch CUDA Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning |
| Last Updated | 2026-02-13 21:00 GMT |
Overview
Core runtime environment for Huggingface Diffusers: Python 3.9+, PyTorch >= 1.4 (2.0+ for scaled dot-product attention), with optional CUDA/XPU/MPS/NPU/MLU GPU acceleration.
Description
This environment provides the foundational runtime for all Diffusers operations. PyTorch is the primary deep learning framework, with device selection following a priority order: CUDA > NPU > XPU > MPS > MLU > CPU. The library uses PyTorch's native scaled dot-product attention (SDPA) as the default attention backend, which requires PyTorch >= 2.0. For advanced attention processors (AttnProcessor2_0, JointAttnProcessor2_0, FusedAttnProcessor2_0, etc.), PyTorch >= 2.0 is a hard requirement. The flex_attention backend requires PyTorch >= 2.5.0, and torch.library.custom_op support requires PyTorch >= 2.4.0.
Usage
Required for all Diffusers workflows including text-to-image inference, LoRA fine-tuning, DreamBooth personalization, ControlNet guided generation, video generation, model quantization, and checkpoint conversion. Every Implementation page in this wiki depends on this environment.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (recommended), macOS, Windows | Linux for full CUDA support |
| Hardware | NVIDIA GPU (CUDA), Intel XPU, Apple MPS, Huawei NPU, Cambricon MLU, or CPU | GPU strongly recommended for inference; required for training |
| Disk | 10GB+ free space | For model weights (varies per model: SD 1.5 ~4GB, SDXL ~7GB, Flux ~24GB) |
Dependencies
System Packages
- CUDA toolkit (for NVIDIA GPUs) — version matching your PyTorch build
- cuDNN (bundled with CUDA toolkit)
Python Packages
- `torch` >= 1.4 (>= 2.0 required for default attention processors)
- `huggingface-hub` >= 0.34.0, < 2.0
- `safetensors` >= 0.3.1
- `numpy`
- `Pillow`
- `requests`
- `filelock`
- `regex` != 2019.12.17
- `httpx` < 1.0.0
- `importlib_metadata`
Credentials
The following environment variables may be needed:
- `HF_TOKEN`: HuggingFace API token — required for gated models (e.g., Stable Diffusion, Flux)
- `HF_ENDPOINT`: Custom HuggingFace Hub endpoint (default: `https://huggingface.co`)
- `HF_HOME`: HuggingFace cache directory
Quick Install
# Install diffusers with PyTorch support
pip install diffusers[torch] transformers accelerate safetensors
# For CUDA GPU support (if not already installed)
pip install torch --index-url https://download.pytorch.org/whl/cu121
Code Evidence
Device selection priority from `torch_utils.py:280-293`:
@functools.lru_cache
def get_device():
if torch.cuda.is_available():
return "cuda"
elif is_torch_npu_available():
return "npu"
elif hasattr(torch, "xpu") and torch.xpu.is_available():
return "xpu"
elif torch.backends.mps.is_available():
return "mps"
elif is_torch_mlu_available():
return "mlu"
else:
return "cpu"
PyTorch 2.0 requirement for attention processors from `attention_processor.py:1834`:
class AttnProcessor2_0:
"""Processor for implementing scaled dot-product attention (enabled by default if you're using PyTorch 2.0)."""
def __init__(self):
if not hasattr(F, "scaled_dot_product_attention"):
raise ImportError("AttnProcessor2_0 requires PyTorch 2.0, ...")
PyTorch version guard for custom_op from `attention_dispatch.py:139-158`:
if torch.__version__ >= "2.4.0":
_custom_op = torch.library.custom_op
_register_fake = torch.library.register_fake
else:
# No-op fallbacks for older PyTorch versions
def custom_op_no_op(name, fn=None, /, *, mutates_args, ...):
def wrap(func):
return func
return wrap if fn is None else fn
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `AttnProcessor2_0 requires PyTorch 2.0` | PyTorch < 2.0 installed | `pip install -U torch` |
| `enable_model_cpu_offload requires accelerator, but not found` | No GPU detected when using CPU offloading | Ensure GPU drivers are installed; use CPU-only pipelines instead |
| `Cannot generate a {device} tensor from a generator of type cuda` | Generator device mismatch | Create generator on same device as target tensor |
Compatibility Notes
- NVIDIA CUDA: Primary supported accelerator. Required for BitsAndBytes quantization.
- Intel XPU: Supported via `torch.xpu` (requires recent PyTorch builds).
- Apple MPS: Supported for inference. Training support is limited (`BACKEND_SUPPORTS_TRAINING["mps"] = False`).
- Huawei NPU: Supported via `torch_npu` extension. Has dedicated attention fusion (`npu_fusion_attention`).
- Cambricon MLU: Supported via `torch_mlu` extension.
- CPU: Always available as fallback. Significantly slower for inference.
- Deterministic mode: Sets `CUDA_LAUNCH_BLOCKING=1` and `CUBLAS_WORKSPACE_CONFIG=:16:8`.
Related Pages
- Implementation:Huggingface_Diffusers_DiffusionPipeline_From_Pretrained
- Implementation:Huggingface_Diffusers_Enable_Model_Cpu_Offload
- Implementation:Huggingface_Diffusers_SDXL_Pipeline_Call
- Implementation:Huggingface_Diffusers_AutoencoderKL_Decode
- Implementation:Huggingface_Diffusers_ModelMixin_From_Pretrained
- Implementation:Huggingface_Diffusers_LoRA_Training_Loop
- Implementation:Huggingface_Diffusers_WanTransformer3DModel_Forward
- Implementation:Huggingface_Diffusers_Video_Pipeline_From_Pretrained