Implementation:Huggingface Diffusers Video Memory Setup
Appearance
| Field | Value |
|---|---|
| Type | API Doc |
| Overview | Concrete API calls for enabling model CPU offloading, VAE tiling, and VAE slicing on video generation pipelines |
| Domains | Video Generation, GPU Memory Optimization |
| Workflow | Video_Generation |
| Related Principle | Huggingface_Diffusers_Video_Memory_Management |
| Source | src/diffusers/pipelines/pipeline_utils.py:L1174-L1270, src/diffusers/models/autoencoders/autoencoder_kl_wan.py:L1086-L1114
|
| Last Updated | 2026-02-13 00:00 GMT |
Code Reference
enable_model_cpu_offload
Source: src/diffusers/pipelines/pipeline_utils.py:L1174-L1268
def enable_model_cpu_offload(self, gpu_id: int | None = None, device: torch.device | str = None):
"""
Offloads all models to CPU using accelerate, reducing memory usage with a low impact on
performance. Compared to enable_sequential_cpu_offload, this method moves one whole model
at a time to the accelerator when its forward method is called.
"""
# ...
self.to("cpu", silence_dtype_warnings=True)
empty_device_cache(device.type)
all_model_components = {k: v for k, v in self.components.items() if isinstance(v, torch.nn.Module)}
self._all_hooks = []
hook = None
for model_str in self.model_cpu_offload_seq.split("->"):
model = all_model_components.pop(model_str, None)
if not isinstance(model, torch.nn.Module):
continue
_, hook = cpu_offload_with_hook(model, device, prev_module_hook=hook)
self._all_hooks.append(hook)
enable_tiling (AutoencoderKLWan)
Source: src/diffusers/models/autoencoders/autoencoder_kl_wan.py:L1086-L1114
def enable_tiling(
self,
tile_sample_min_height: int | None = None,
tile_sample_min_width: int | None = None,
tile_sample_stride_height: float | None = None,
tile_sample_stride_width: float | None = None,
) -> None:
"""
Enable tiled VAE decoding. When this option is enabled, the VAE will split the input
tensor into tiles to compute decoding and encoding in several steps. This is useful for
saving a large amount of memory and to allow processing larger images.
"""
self.use_tiling = True
self.tile_sample_min_height = tile_sample_min_height or self.tile_sample_min_height
self.tile_sample_min_width = tile_sample_min_width or self.tile_sample_min_width
self.tile_sample_stride_height = tile_sample_stride_height or self.tile_sample_stride_height
self.tile_sample_stride_width = tile_sample_stride_width or self.tile_sample_stride_width
Key Parameters
| Method | Parameter | Description | Default |
|---|---|---|---|
enable_model_cpu_offload |
gpu_id |
GPU device ID to use | 0
|
enable_model_cpu_offload |
device |
PyTorch device type string | Auto-detected |
enable_tiling |
tile_sample_min_height |
Minimum tile height in pixels | 256
|
enable_tiling |
tile_sample_min_width |
Minimum tile width in pixels | 256
|
enable_tiling |
tile_sample_stride_height |
Stride between vertical tiles | 192
|
enable_tiling |
tile_sample_stride_width |
Stride between horizontal tiles | 192
|
I/O Contract
enable_model_cpu_offload
Inputs:
- Pipeline instance with all model components loaded
Outputs:
- Modified pipeline where all model modules have been moved to CPU and wrapped with
acceleratehooks for automatic GPU migration
Side Effects:
- Clears GPU memory cache
- Sets
self._all_hookswith the offload hook chain - Sets
self._offload_deviceandself._offload_gpu_id
enable_tiling
Inputs:
- VAE instance (
AutoencoderKLWan,AutoencoderKLHunyuanVideo, etc.)
Outputs:
- Modified VAE with
use_tiling = Trueand configured tile dimensions
Side Effects:
_decodeand_encodemethods will route totiled_decode/tiled_encodewhen spatial dimensions exceed tile minimums
External Dependencies
accelerate >= 0.17.0(forcpu_offload_with_hook)- CUDA-compatible GPU (CPU offloading requires a CUDA/MPS/XPU device)
Usage Examples
Minimal Memory Configuration for Wan 14B
import torch
from diffusers import AutoencoderKLWan, WanPipeline
model_id = "Wan-AI/Wan2.1-T2V-14B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
# Enable memory optimizations
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
# Generate - only one component on GPU at a time
output = pipe(prompt="A sunset over mountains", num_frames=81, height=720, width=1280)
Custom Tile Sizes for HunyuanVideo
import torch
from diffusers import HunyuanVideoPipeline
pipe = HunyuanVideoPipeline.from_pretrained("hunyuanvideo-community/HunyuanVideo", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
# Larger tiles = fewer seams but more memory per tile
pipe.vae.enable_tiling(
tile_sample_min_height=512,
tile_sample_min_width=512,
tile_sample_stride_height=384,
tile_sample_stride_width=384,
)
Combining Tiling and Slicing for Batch Processing
pipe.vae.enable_tiling()
pipe.vae.enable_slicing() # Process batch elements one at a time
Related Pages
- Huggingface_Diffusers_Video_Memory_Management (principle for this implementation) - Theory behind memory optimization strategies
- Huggingface_Diffusers_Video_Pipeline_From_Pretrained (prerequisite) - Pipeline must be loaded before enabling optimizations
- Huggingface_Diffusers_Export_To_Video (benefits from tiling) - Decoding step uses the tiling configuration
Principle:Huggingface_Diffusers_Video_Memory_Management
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment