Environment:Huggingface Diffusers PyTorch CUDA Runtime

Knowledge Sources	Huggingface Diffusers PyTorch CUDA
Domains	Infrastructure, Deep_Learning
Last Updated	2026-02-13 21:00 GMT

Overview

Core runtime environment for Huggingface Diffusers: Python 3.9+, PyTorch >= 1.4 (2.0+ for scaled dot-product attention), with optional CUDA/XPU/MPS/NPU/MLU GPU acceleration.

Description

This environment provides the foundational runtime for all Diffusers operations. PyTorch is the primary deep learning framework, with device selection following a priority order: CUDA > NPU > XPU > MPS > MLU > CPU. The library uses PyTorch's native scaled dot-product attention (SDPA) as the default attention backend, which requires PyTorch >= 2.0. For advanced attention processors (AttnProcessor2_0, JointAttnProcessor2_0, FusedAttnProcessor2_0, etc.), PyTorch >= 2.0 is a hard requirement. The flex_attention backend requires PyTorch >= 2.5.0, and torch.library.custom_op support requires PyTorch >= 2.4.0.

Usage

Required for all Diffusers workflows including text-to-image inference, LoRA fine-tuning, DreamBooth personalization, ControlNet guided generation, video generation, model quantization, and checkpoint conversion. Every Implementation page in this wiki depends on this environment.

System Requirements

Category	Requirement	Notes
OS	Linux (recommended), macOS, Windows	Linux for full CUDA support
Hardware	NVIDIA GPU (CUDA), Intel XPU, Apple MPS, Huawei NPU, Cambricon MLU, or CPU	GPU strongly recommended for inference; required for training
Disk	10GB+ free space	For model weights (varies per model: SD 1.5 ~4GB, SDXL ~7GB, Flux ~24GB)

Dependencies

System Packages

CUDA toolkit (for NVIDIA GPUs) — version matching your PyTorch build
cuDNN (bundled with CUDA toolkit)

Python Packages

`torch` >= 1.4 (>= 2.0 required for default attention processors)
`huggingface-hub` >= 0.34.0, < 2.0
`safetensors` >= 0.3.1
`numpy`
`Pillow`
`requests`
`filelock`
`regex` != 2019.12.17
`httpx` < 1.0.0
`importlib_metadata`

Credentials

The following environment variables may be needed:

`HF_TOKEN`: HuggingFace API token — required for gated models (e.g., Stable Diffusion, Flux)
`HF_ENDPOINT`: Custom HuggingFace Hub endpoint (default: `https://huggingface.co`)
`HF_HOME`: HuggingFace cache directory

Quick Install

# Install diffusers with PyTorch support
pip install diffusers[torch] transformers accelerate safetensors

# For CUDA GPU support (if not already installed)
pip install torch --index-url https://download.pytorch.org/whl/cu121

Code Evidence

Device selection priority from `torch_utils.py:280-293`:

@functools.lru_cache
def get_device():
    if torch.cuda.is_available():
        return "cuda"
    elif is_torch_npu_available():
        return "npu"
    elif hasattr(torch, "xpu") and torch.xpu.is_available():
        return "xpu"
    elif torch.backends.mps.is_available():
        return "mps"
    elif is_torch_mlu_available():
        return "mlu"
    else:
        return "cpu"

PyTorch 2.0 requirement for attention processors from `attention_processor.py:1834`:

class AttnProcessor2_0:
    """Processor for implementing scaled dot-product attention (enabled by default if you're using PyTorch 2.0)."""
    def __init__(self):
        if not hasattr(F, "scaled_dot_product_attention"):
            raise ImportError("AttnProcessor2_0 requires PyTorch 2.0, ...")

PyTorch version guard for custom_op from `attention_dispatch.py:139-158`:

if torch.__version__ >= "2.4.0":
    _custom_op = torch.library.custom_op
    _register_fake = torch.library.register_fake
else:
    # No-op fallbacks for older PyTorch versions
    def custom_op_no_op(name, fn=None, /, *, mutates_args, ...):
        def wrap(func):
            return func
        return wrap if fn is None else fn

Common Errors

Error Message	Cause	Solution
`AttnProcessor2_0 requires PyTorch 2.0`	PyTorch < 2.0 installed	`pip install -U torch`
`enable_model_cpu_offload requires accelerator, but not found`	No GPU detected when using CPU offloading	Ensure GPU drivers are installed; use CPU-only pipelines instead
`Cannot generate a {device} tensor from a generator of type cuda`	Generator device mismatch	Create generator on same device as target tensor

Compatibility Notes

NVIDIA CUDA: Primary supported accelerator. Required for BitsAndBytes quantization.
Intel XPU: Supported via `torch.xpu` (requires recent PyTorch builds).
Apple MPS: Supported for inference. Training support is limited (`BACKEND_SUPPORTS_TRAINING["mps"] = False`).
Huawei NPU: Supported via `torch_npu` extension. Has dedicated attention fusion (`npu_fusion_attention`).
Cambricon MLU: Supported via `torch_mlu` extension.
CPU: Always available as fallback. Significantly slower for inference.
Deterministic mode: Sets `CUDA_LAUNCH_BLOCKING=1` and `CUBLAS_WORKSPACE_CONFIG=:16:8`.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment