Environment:AUTOMATIC1111 Stable diffusion webui GPU Compute Backend

Knowledge Sources	AUTOMATIC1111/stable-diffusion-webui NVIDIA CUDA Toolkit
Domains	Infrastructure, Hardware
Last Updated	2026-02-08 08:00 GMT

Overview

Multi-backend GPU compute environment supporting NVIDIA CUDA, Apple MPS, Intel XPU, and Huawei Ascend NPU with automatic device detection and fallback to CPU.

Description

The WebUI abstracts GPU hardware through `modules/devices.py`, which implements a hierarchical device selection strategy: CUDA > MPS > XPU > NPU > CPU. Each backend has platform-specific modules (`mac_specific.py`, `xpu_specific.py`, `npu_specific.py`) that handle device detection, memory management, and garbage collection. The system supports mixed-precision inference (fp16/fp32/fp8) with automatic casting and manual casting fallbacks for devices that lack native autocast support (MPS, XPU, GTX 16 series).

Usage

This environment is required for GPU-accelerated inference and training. Without a GPU, the WebUI falls back to CPU mode which is significantly slower. The GPU backend is used by all generation workflows (txt2img, img2img), all training workflows (textual inversion, hypernetwork), and all upscaling operations.

System Requirements

Category	Requirement	Notes
NVIDIA GPU	Compute capability >= 6.0	Required for xformers; capability 7.5 (GTX 16xx) needs special handling
NVIDIA Driver	Compatible with CUDA 12.1	Driver >= 530.xx recommended
CUDA Toolkit	12.1 (default)	11.8 supported via WSL2 conda environment
Apple Silicon	M1/M2/M3 with MPS	macOS with Metal Performance Shaders support
Intel Arc	XPU with oneAPI	Requires intel-extension-for-pytorch
Ascend NPU	Huawei NPU with torch_npu	Requires separate requirements_npu.txt
VRAM	4GB minimum	8GB+ recommended; 4GB requires --lowvram flag

Dependencies

NVIDIA CUDA

`torch` == 2.1.2 (cu121 wheels)
`torchvision` == 0.16.2
NVIDIA driver >= 530.xx

Apple MPS

`torch` >= 2.0.1 (MPS backend)
macOS 12.3+ (Monterey or later)
Set `PYTORCH_ENABLE_MPS_FALLBACK=1` environment variable

Intel XPU

`intel-extension-for-pytorch` == 2.0.110
`torch` == 2.0.0a0 (Intel build)
oneAPI toolkit (Linux) or bundled DLLs (Windows)

Huawei NPU

`torch_npu` package
`cloudpickle`, `decorator`, `synr` == 0.5.0, `tornado`

Credentials

`TORCH_INDEX_URL`: PyTorch wheel index URL (default: https://download.pytorch.org/whl/cu121)
`TORCH_COMMAND`: Custom torch installation command
`PYTORCH_ENABLE_MPS_FALLBACK`: Required for MPS devices (set to "1")
`CUDA_VISIBLE_DEVICES`: GPU selection for multi-GPU systems

Quick Install

# NVIDIA CUDA (default)
pip install torch==2.1.2 torchvision==0.16.2 --extra-index-url https://download.pytorch.org/whl/cu121

# Intel XPU (Linux)
pip install torch==2.0.0a0 intel-extension-for-pytorch==2.0.110+gitba7f6c1 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

Code Evidence

Hierarchical device selection from `modules/devices.py:50-63`:

def get_optimal_device_name():
    if torch.cuda.is_available():
        return get_cuda_device_string()

    if has_mps():
        return "mps"

    if has_xpu():
        return xpu_specific.get_xpu_device_string()

    if npu_specific.has_npu:
        return npu_specific.get_npu_device_string()

    return "cpu"

GTX 16 series detection from `modules/devices.py:26-32`:

def cuda_no_autocast(device_id=None) -> bool:
    if device_id is None:
        device_id = get_cuda_device_id()
    return (
        torch.cuda.get_device_capability(device_id) == (7, 5)
        and torch.cuda.get_device_name(device_id).startswith("NVIDIA GeForce GTX 16")
    )

Xformers availability check from `modules/sd_hijack_optimizations.py:56-57`:

def is_available(self):
    return shared.cmd_opts.force_enable_xformers or (
        shared.xformers_available and torch.cuda.is_available()
        and (6, 0) <= torch.cuda.get_device_capability(shared.device) <= (9, 0)
    )

NPU bug workaround from `modules/devices.py:95-98`:

def torch_npu_set_device():
    # Work around due to bug in torch_npu, revert me after fixed
    if npu_specific.has_npu:
        torch.npu.set_device(0)

Common Errors

Error Message	Cause	Solution
`Torch is not able to use GPU`	No CUDA-capable GPU or driver issue	Install NVIDIA drivers; use `--skip-torch-cuda-test` for non-CUDA setups
`CUDA out of memory`	Insufficient VRAM	Use `--medvram` or `--lowvram` flags; reduce image resolution
`A tensor with NaNs was produced in Unet`	GPU lacks fp16 precision support	Use `--upcast-sampling` or `--no-half` flag
`A tensor with NaNs was produced in VAE`	VAE precision issue	Use `--no-half-vae` flag
MPS tensor errors	MPS backend limitations	Ensure `PYTORCH_ENABLE_MPS_FALLBACK=1` is set

Compatibility Notes

NVIDIA CUDA: Primary supported platform. Compute capability 6.0+ for xformers. GTX 16 series (capability 7.5) requires benchmark mode for fp16.
Apple MPS: Requires PyTorch >= 2.0.1. Uses sub-quadratic attention as default optimizer (priority 1000 on MPS). Needs `PYTORCH_ENABLE_MPS_FALLBACK=1`.
Intel XPU: Requires `--use-ipex` flag. Windows needs unofficial wheels (Python 3.10 only). Linux uses official IPEX with manual oneAPI setup.
Huawei NPU: Has a known `torch_npu` device-setting bug requiring explicit `torch.npu.set_device(0)` calls.
CPU: Supported as fallback but extremely slow. Usable with `--use-cpu all` flag.
Multi-GPU: Use `--device-id` to select specific GPU. For visibility control, set `CUDA_VISIBLE_DEVICES` before launch.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment