Heuristic:AUTOMATIC1111 Stable diffusion webui NaN Detection And Precision Fixes
| Knowledge Sources | |
|---|---|
| Domains | Debugging, Optimization |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
Runtime NaN detection system with context-specific diagnostic messages and a graduated precision fix escalation path: upcast-sampling, no-half-vae, no-half, and disable-nan-check.
Description
The WebUI includes a NaN detection system (`test_for_nans`) that checks the first element of output tensors from the UNet and VAE for NaN values. When detected, it raises a `NansException` with a context-specific error message that suggests the appropriate fix. The system recognizes two distinct failure contexts (UNet and VAE) and provides different fix recommendations for each. The check can be disabled entirely with `--disable-nan-check` for CI environments or when running without a model.
Usage
This heuristic helps diagnose black images, corrupted outputs, or generation failures caused by floating-point precision issues. The graduated fix path is:
- --upcast-sampling: Upcast the sampling computation to fp32 while keeping the model in fp16 (best balance of speed and quality).
- --no-half-vae: Keep only the VAE in fp32 (fixes VAE-specific NaN issues without affecting UNet speed).
- --no-half: Run the entire model in fp32 (most conservative, slowest, highest VRAM usage).
- --disable-nan-check: Disable the check entirely (for CI/testing only).
The Insight (Rule of Thumb)
- Action: If UNet produces NaN, try `--upcast-sampling` first, then `--no-half`. If VAE produces NaN, try `--no-half-vae`.
- Value: The check tests only the first element `x[(0,) * len(x.shape)]` for efficiency rather than scanning the entire tensor.
- Trade-off: `--no-half` doubles VRAM usage and reduces speed. `--upcast-sampling` has minimal overhead. `--no-half-vae` affects only VAE encode/decode steps.
- Context sensitivity: The "Upcast cross attention layer to float32" settings option provides a middle ground that upcasts only attention layers.
Reasoning
Half-precision (fp16) has limited dynamic range (max ~65504, min ~6e-8). Some operations in the UNet (particularly attention softmax) and VAE (particularly the decoder's group normalization) can exceed this range, producing infinity or NaN values that propagate through the network. Different hardware has different tolerance: consumer GPUs without Tensor Cores are most susceptible, while A100/H100 handle fp16 natively. The graduated fix path allows users to find the minimum precision increase needed for stable results.
Code Evidence
NaN detection with context-specific messages from `modules/devices.py:242-265`:
def test_for_nans(x, where):
if shared.cmd_opts.disable_nan_check:
return
if not torch.isnan(x[(0, ) * len(x.shape)]):
return
if where == "unet":
message = "A tensor with NaNs was produced in Unet."
if not shared.cmd_opts.no_half:
message += " This could be either because there's not enough precision to represent "
"the picture, or because your video card does not support half type. Try setting the "
"\"Upcast cross attention layer to float32\" option in Settings > Stable Diffusion "
"or using the --no-half commandline argument to fix this."
elif where == "vae":
message = "A tensor with NaNs was produced in VAE."
if not shared.cmd_opts.no_half and not shared.cmd_opts.no_half_vae:
message += " This could be because there's not enough precision to represent the "
"picture. Try adding --no-half-vae commandline argument to fix this."
raise NansException(message)