Environment:AUTOMATIC1111 Stable diffusion webui GPU Compute Backend
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Hardware |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
Multi-backend GPU compute environment supporting NVIDIA CUDA, Apple MPS, Intel XPU, and Huawei Ascend NPU with automatic device detection and fallback to CPU.
Description
The WebUI abstracts GPU hardware through `modules/devices.py`, which implements a hierarchical device selection strategy: CUDA > MPS > XPU > NPU > CPU. Each backend has platform-specific modules (`mac_specific.py`, `xpu_specific.py`, `npu_specific.py`) that handle device detection, memory management, and garbage collection. The system supports mixed-precision inference (fp16/fp32/fp8) with automatic casting and manual casting fallbacks for devices that lack native autocast support (MPS, XPU, GTX 16 series).
Usage
This environment is required for GPU-accelerated inference and training. Without a GPU, the WebUI falls back to CPU mode which is significantly slower. The GPU backend is used by all generation workflows (txt2img, img2img), all training workflows (textual inversion, hypernetwork), and all upscaling operations.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| NVIDIA GPU | Compute capability >= 6.0 | Required for xformers; capability 7.5 (GTX 16xx) needs special handling |
| NVIDIA Driver | Compatible with CUDA 12.1 | Driver >= 530.xx recommended |
| CUDA Toolkit | 12.1 (default) | 11.8 supported via WSL2 conda environment |
| Apple Silicon | M1/M2/M3 with MPS | macOS with Metal Performance Shaders support |
| Intel Arc | XPU with oneAPI | Requires intel-extension-for-pytorch |
| Ascend NPU | Huawei NPU with torch_npu | Requires separate requirements_npu.txt |
| VRAM | 4GB minimum | 8GB+ recommended; 4GB requires --lowvram flag |
Dependencies
NVIDIA CUDA
- `torch` == 2.1.2 (cu121 wheels)
- `torchvision` == 0.16.2
- NVIDIA driver >= 530.xx
Apple MPS
- `torch` >= 2.0.1 (MPS backend)
- macOS 12.3+ (Monterey or later)
- Set `PYTORCH_ENABLE_MPS_FALLBACK=1` environment variable
Intel XPU
- `intel-extension-for-pytorch` == 2.0.110
- `torch` == 2.0.0a0 (Intel build)
- oneAPI toolkit (Linux) or bundled DLLs (Windows)
Huawei NPU
- `torch_npu` package
- `cloudpickle`, `decorator`, `synr` == 0.5.0, `tornado`
Credentials
- `TORCH_INDEX_URL`: PyTorch wheel index URL (default: https://download.pytorch.org/whl/cu121)
- `TORCH_COMMAND`: Custom torch installation command
- `PYTORCH_ENABLE_MPS_FALLBACK`: Required for MPS devices (set to "1")
- `CUDA_VISIBLE_DEVICES`: GPU selection for multi-GPU systems
Quick Install
# NVIDIA CUDA (default)
pip install torch==2.1.2 torchvision==0.16.2 --extra-index-url https://download.pytorch.org/whl/cu121
# Intel XPU (Linux)
pip install torch==2.0.0a0 intel-extension-for-pytorch==2.0.110+gitba7f6c1 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
Code Evidence
Hierarchical device selection from `modules/devices.py:50-63`:
def get_optimal_device_name():
if torch.cuda.is_available():
return get_cuda_device_string()
if has_mps():
return "mps"
if has_xpu():
return xpu_specific.get_xpu_device_string()
if npu_specific.has_npu:
return npu_specific.get_npu_device_string()
return "cpu"
GTX 16 series detection from `modules/devices.py:26-32`:
def cuda_no_autocast(device_id=None) -> bool:
if device_id is None:
device_id = get_cuda_device_id()
return (
torch.cuda.get_device_capability(device_id) == (7, 5)
and torch.cuda.get_device_name(device_id).startswith("NVIDIA GeForce GTX 16")
)
Xformers availability check from `modules/sd_hijack_optimizations.py:56-57`:
def is_available(self):
return shared.cmd_opts.force_enable_xformers or (
shared.xformers_available and torch.cuda.is_available()
and (6, 0) <= torch.cuda.get_device_capability(shared.device) <= (9, 0)
)
NPU bug workaround from `modules/devices.py:95-98`:
def torch_npu_set_device():
# Work around due to bug in torch_npu, revert me after fixed
if npu_specific.has_npu:
torch.npu.set_device(0)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `Torch is not able to use GPU` | No CUDA-capable GPU or driver issue | Install NVIDIA drivers; use `--skip-torch-cuda-test` for non-CUDA setups |
| `CUDA out of memory` | Insufficient VRAM | Use `--medvram` or `--lowvram` flags; reduce image resolution |
| `A tensor with NaNs was produced in Unet` | GPU lacks fp16 precision support | Use `--upcast-sampling` or `--no-half` flag |
| `A tensor with NaNs was produced in VAE` | VAE precision issue | Use `--no-half-vae` flag |
| MPS tensor errors | MPS backend limitations | Ensure `PYTORCH_ENABLE_MPS_FALLBACK=1` is set |
Compatibility Notes
- NVIDIA CUDA: Primary supported platform. Compute capability 6.0+ for xformers. GTX 16 series (capability 7.5) requires benchmark mode for fp16.
- Apple MPS: Requires PyTorch >= 2.0.1. Uses sub-quadratic attention as default optimizer (priority 1000 on MPS). Needs `PYTORCH_ENABLE_MPS_FALLBACK=1`.
- Intel XPU: Requires `--use-ipex` flag. Windows needs unofficial wheels (Python 3.10 only). Linux uses official IPEX with manual oneAPI setup.
- Huawei NPU: Has a known `torch_npu` device-setting bug requiring explicit `torch.npu.set_device(0)` calls.
- CPU: Supported as fallback but extremely slow. Usable with `--use-cpu all` flag.
- Multi-GPU: Use `--device-id` to select specific GPU. For visibility control, set `CUDA_VISIBLE_DEVICES` before launch.
Related Pages
- Implementation:AUTOMATIC1111_Stable_diffusion_webui_StableDiffusionProcessingTxt2Img
- Implementation:AUTOMATIC1111_Stable_diffusion_webui_KDiffusionSampler_sample
- Implementation:AUTOMATIC1111_Stable_diffusion_webui_Decode_latent_batch
- Implementation:AUTOMATIC1111_Stable_diffusion_webui_Train_embedding
- Implementation:AUTOMATIC1111_Stable_diffusion_webui_Train_hypernetwork
- Implementation:AUTOMATIC1111_Stable_diffusion_webui_Upscaler_upscale