Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:AUTOMATIC1111 Stable diffusion webui GPU Compute Backend

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Hardware
Last Updated 2026-02-08 08:00 GMT

Overview

Multi-backend GPU compute environment supporting NVIDIA CUDA, Apple MPS, Intel XPU, and Huawei Ascend NPU with automatic device detection and fallback to CPU.

Description

The WebUI abstracts GPU hardware through `modules/devices.py`, which implements a hierarchical device selection strategy: CUDA > MPS > XPU > NPU > CPU. Each backend has platform-specific modules (`mac_specific.py`, `xpu_specific.py`, `npu_specific.py`) that handle device detection, memory management, and garbage collection. The system supports mixed-precision inference (fp16/fp32/fp8) with automatic casting and manual casting fallbacks for devices that lack native autocast support (MPS, XPU, GTX 16 series).

Usage

This environment is required for GPU-accelerated inference and training. Without a GPU, the WebUI falls back to CPU mode which is significantly slower. The GPU backend is used by all generation workflows (txt2img, img2img), all training workflows (textual inversion, hypernetwork), and all upscaling operations.

System Requirements

Category Requirement Notes
NVIDIA GPU Compute capability >= 6.0 Required for xformers; capability 7.5 (GTX 16xx) needs special handling
NVIDIA Driver Compatible with CUDA 12.1 Driver >= 530.xx recommended
CUDA Toolkit 12.1 (default) 11.8 supported via WSL2 conda environment
Apple Silicon M1/M2/M3 with MPS macOS with Metal Performance Shaders support
Intel Arc XPU with oneAPI Requires intel-extension-for-pytorch
Ascend NPU Huawei NPU with torch_npu Requires separate requirements_npu.txt
VRAM 4GB minimum 8GB+ recommended; 4GB requires --lowvram flag

Dependencies

NVIDIA CUDA

  • `torch` == 2.1.2 (cu121 wheels)
  • `torchvision` == 0.16.2
  • NVIDIA driver >= 530.xx

Apple MPS

  • `torch` >= 2.0.1 (MPS backend)
  • macOS 12.3+ (Monterey or later)
  • Set `PYTORCH_ENABLE_MPS_FALLBACK=1` environment variable

Intel XPU

  • `intel-extension-for-pytorch` == 2.0.110
  • `torch` == 2.0.0a0 (Intel build)
  • oneAPI toolkit (Linux) or bundled DLLs (Windows)

Huawei NPU

  • `torch_npu` package
  • `cloudpickle`, `decorator`, `synr` == 0.5.0, `tornado`

Credentials

  • `TORCH_INDEX_URL`: PyTorch wheel index URL (default: https://download.pytorch.org/whl/cu121)
  • `TORCH_COMMAND`: Custom torch installation command
  • `PYTORCH_ENABLE_MPS_FALLBACK`: Required for MPS devices (set to "1")
  • `CUDA_VISIBLE_DEVICES`: GPU selection for multi-GPU systems

Quick Install

# NVIDIA CUDA (default)
pip install torch==2.1.2 torchvision==0.16.2 --extra-index-url https://download.pytorch.org/whl/cu121

# Intel XPU (Linux)
pip install torch==2.0.0a0 intel-extension-for-pytorch==2.0.110+gitba7f6c1 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

Code Evidence

Hierarchical device selection from `modules/devices.py:50-63`:

def get_optimal_device_name():
    if torch.cuda.is_available():
        return get_cuda_device_string()

    if has_mps():
        return "mps"

    if has_xpu():
        return xpu_specific.get_xpu_device_string()

    if npu_specific.has_npu:
        return npu_specific.get_npu_device_string()

    return "cpu"

GTX 16 series detection from `modules/devices.py:26-32`:

def cuda_no_autocast(device_id=None) -> bool:
    if device_id is None:
        device_id = get_cuda_device_id()
    return (
        torch.cuda.get_device_capability(device_id) == (7, 5)
        and torch.cuda.get_device_name(device_id).startswith("NVIDIA GeForce GTX 16")
    )

Xformers availability check from `modules/sd_hijack_optimizations.py:56-57`:

def is_available(self):
    return shared.cmd_opts.force_enable_xformers or (
        shared.xformers_available and torch.cuda.is_available()
        and (6, 0) <= torch.cuda.get_device_capability(shared.device) <= (9, 0)
    )

NPU bug workaround from `modules/devices.py:95-98`:

def torch_npu_set_device():
    # Work around due to bug in torch_npu, revert me after fixed
    if npu_specific.has_npu:
        torch.npu.set_device(0)

Common Errors

Error Message Cause Solution
`Torch is not able to use GPU` No CUDA-capable GPU or driver issue Install NVIDIA drivers; use `--skip-torch-cuda-test` for non-CUDA setups
`CUDA out of memory` Insufficient VRAM Use `--medvram` or `--lowvram` flags; reduce image resolution
`A tensor with NaNs was produced in Unet` GPU lacks fp16 precision support Use `--upcast-sampling` or `--no-half` flag
`A tensor with NaNs was produced in VAE` VAE precision issue Use `--no-half-vae` flag
MPS tensor errors MPS backend limitations Ensure `PYTORCH_ENABLE_MPS_FALLBACK=1` is set

Compatibility Notes

  • NVIDIA CUDA: Primary supported platform. Compute capability 6.0+ for xformers. GTX 16 series (capability 7.5) requires benchmark mode for fp16.
  • Apple MPS: Requires PyTorch >= 2.0.1. Uses sub-quadratic attention as default optimizer (priority 1000 on MPS). Needs `PYTORCH_ENABLE_MPS_FALLBACK=1`.
  • Intel XPU: Requires `--use-ipex` flag. Windows needs unofficial wheels (Python 3.10 only). Linux uses official IPEX with manual oneAPI setup.
  • Huawei NPU: Has a known `torch_npu` device-setting bug requiring explicit `torch.npu.set_device(0)` calls.
  • CPU: Supported as fallback but extremely slow. Usable with `--use-cpu all` flag.
  • Multi-GPU: Use `--device-id` to select specific GPU. For visibility control, set `CUDA_VISIBLE_DEVICES` before launch.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment