Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Huggingface Diffusers PyTorch CUDA Runtime

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Deep_Learning
Last Updated 2026-02-13 21:00 GMT

Overview

Core runtime environment for Huggingface Diffusers: Python 3.9+, PyTorch >= 1.4 (2.0+ for scaled dot-product attention), with optional CUDA/XPU/MPS/NPU/MLU GPU acceleration.

Description

This environment provides the foundational runtime for all Diffusers operations. PyTorch is the primary deep learning framework, with device selection following a priority order: CUDA > NPU > XPU > MPS > MLU > CPU. The library uses PyTorch's native scaled dot-product attention (SDPA) as the default attention backend, which requires PyTorch >= 2.0. For advanced attention processors (AttnProcessor2_0, JointAttnProcessor2_0, FusedAttnProcessor2_0, etc.), PyTorch >= 2.0 is a hard requirement. The flex_attention backend requires PyTorch >= 2.5.0, and torch.library.custom_op support requires PyTorch >= 2.4.0.

Usage

Required for all Diffusers workflows including text-to-image inference, LoRA fine-tuning, DreamBooth personalization, ControlNet guided generation, video generation, model quantization, and checkpoint conversion. Every Implementation page in this wiki depends on this environment.

System Requirements

Category Requirement Notes
OS Linux (recommended), macOS, Windows Linux for full CUDA support
Hardware NVIDIA GPU (CUDA), Intel XPU, Apple MPS, Huawei NPU, Cambricon MLU, or CPU GPU strongly recommended for inference; required for training
Disk 10GB+ free space For model weights (varies per model: SD 1.5 ~4GB, SDXL ~7GB, Flux ~24GB)

Dependencies

System Packages

  • CUDA toolkit (for NVIDIA GPUs) — version matching your PyTorch build
  • cuDNN (bundled with CUDA toolkit)

Python Packages

  • `torch` >= 1.4 (>= 2.0 required for default attention processors)
  • `huggingface-hub` >= 0.34.0, < 2.0
  • `safetensors` >= 0.3.1
  • `numpy`
  • `Pillow`
  • `requests`
  • `filelock`
  • `regex` != 2019.12.17
  • `httpx` < 1.0.0
  • `importlib_metadata`

Credentials

The following environment variables may be needed:

  • `HF_TOKEN`: HuggingFace API token — required for gated models (e.g., Stable Diffusion, Flux)
  • `HF_ENDPOINT`: Custom HuggingFace Hub endpoint (default: `https://huggingface.co`)
  • `HF_HOME`: HuggingFace cache directory

Quick Install

# Install diffusers with PyTorch support
pip install diffusers[torch] transformers accelerate safetensors

# For CUDA GPU support (if not already installed)
pip install torch --index-url https://download.pytorch.org/whl/cu121

Code Evidence

Device selection priority from `torch_utils.py:280-293`:

@functools.lru_cache
def get_device():
    if torch.cuda.is_available():
        return "cuda"
    elif is_torch_npu_available():
        return "npu"
    elif hasattr(torch, "xpu") and torch.xpu.is_available():
        return "xpu"
    elif torch.backends.mps.is_available():
        return "mps"
    elif is_torch_mlu_available():
        return "mlu"
    else:
        return "cpu"

PyTorch 2.0 requirement for attention processors from `attention_processor.py:1834`:

class AttnProcessor2_0:
    """Processor for implementing scaled dot-product attention (enabled by default if you're using PyTorch 2.0)."""
    def __init__(self):
        if not hasattr(F, "scaled_dot_product_attention"):
            raise ImportError("AttnProcessor2_0 requires PyTorch 2.0, ...")

PyTorch version guard for custom_op from `attention_dispatch.py:139-158`:

if torch.__version__ >= "2.4.0":
    _custom_op = torch.library.custom_op
    _register_fake = torch.library.register_fake
else:
    # No-op fallbacks for older PyTorch versions
    def custom_op_no_op(name, fn=None, /, *, mutates_args, ...):
        def wrap(func):
            return func
        return wrap if fn is None else fn

Common Errors

Error Message Cause Solution
`AttnProcessor2_0 requires PyTorch 2.0` PyTorch < 2.0 installed `pip install -U torch`
`enable_model_cpu_offload requires accelerator, but not found` No GPU detected when using CPU offloading Ensure GPU drivers are installed; use CPU-only pipelines instead
`Cannot generate a {device} tensor from a generator of type cuda` Generator device mismatch Create generator on same device as target tensor

Compatibility Notes

  • NVIDIA CUDA: Primary supported accelerator. Required for BitsAndBytes quantization.
  • Intel XPU: Supported via `torch.xpu` (requires recent PyTorch builds).
  • Apple MPS: Supported for inference. Training support is limited (`BACKEND_SUPPORTS_TRAINING["mps"] = False`).
  • Huawei NPU: Supported via `torch_npu` extension. Has dedicated attention fusion (`npu_fusion_attention`).
  • Cambricon MLU: Supported via `torch_mlu` extension.
  • CPU: Always available as fallback. Significantly slower for inference.
  • Deterministic mode: Sets `CUDA_LAUNCH_BLOCKING=1` and `CUBLAS_WORKSPACE_CONFIG=:16:8`.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment