Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Diffusers Video Memory Setup

From Leeroopedia
Field Value
Type API Doc
Overview Concrete API calls for enabling model CPU offloading, VAE tiling, and VAE slicing on video generation pipelines
Domains Video Generation, GPU Memory Optimization
Workflow Video_Generation
Related Principle Huggingface_Diffusers_Video_Memory_Management
Source src/diffusers/pipelines/pipeline_utils.py:L1174-L1270, src/diffusers/models/autoencoders/autoencoder_kl_wan.py:L1086-L1114
Last Updated 2026-02-13 00:00 GMT

Code Reference

enable_model_cpu_offload

Source: src/diffusers/pipelines/pipeline_utils.py:L1174-L1268

def enable_model_cpu_offload(self, gpu_id: int | None = None, device: torch.device | str = None):
    """
    Offloads all models to CPU using accelerate, reducing memory usage with a low impact on
    performance. Compared to enable_sequential_cpu_offload, this method moves one whole model
    at a time to the accelerator when its forward method is called.
    """
    # ...
    self.to("cpu", silence_dtype_warnings=True)
    empty_device_cache(device.type)

    all_model_components = {k: v for k, v in self.components.items() if isinstance(v, torch.nn.Module)}
    self._all_hooks = []
    hook = None
    for model_str in self.model_cpu_offload_seq.split("->"):
        model = all_model_components.pop(model_str, None)
        if not isinstance(model, torch.nn.Module):
            continue
        _, hook = cpu_offload_with_hook(model, device, prev_module_hook=hook)
        self._all_hooks.append(hook)

enable_tiling (AutoencoderKLWan)

Source: src/diffusers/models/autoencoders/autoencoder_kl_wan.py:L1086-L1114

def enable_tiling(
    self,
    tile_sample_min_height: int | None = None,
    tile_sample_min_width: int | None = None,
    tile_sample_stride_height: float | None = None,
    tile_sample_stride_width: float | None = None,
) -> None:
    """
    Enable tiled VAE decoding. When this option is enabled, the VAE will split the input
    tensor into tiles to compute decoding and encoding in several steps. This is useful for
    saving a large amount of memory and to allow processing larger images.
    """
    self.use_tiling = True
    self.tile_sample_min_height = tile_sample_min_height or self.tile_sample_min_height
    self.tile_sample_min_width = tile_sample_min_width or self.tile_sample_min_width
    self.tile_sample_stride_height = tile_sample_stride_height or self.tile_sample_stride_height
    self.tile_sample_stride_width = tile_sample_stride_width or self.tile_sample_stride_width

Key Parameters

Method Parameter Description Default
enable_model_cpu_offload gpu_id GPU device ID to use 0
enable_model_cpu_offload device PyTorch device type string Auto-detected
enable_tiling tile_sample_min_height Minimum tile height in pixels 256
enable_tiling tile_sample_min_width Minimum tile width in pixels 256
enable_tiling tile_sample_stride_height Stride between vertical tiles 192
enable_tiling tile_sample_stride_width Stride between horizontal tiles 192

I/O Contract

enable_model_cpu_offload

Inputs:

  • Pipeline instance with all model components loaded

Outputs:

  • Modified pipeline where all model modules have been moved to CPU and wrapped with accelerate hooks for automatic GPU migration

Side Effects:

  • Clears GPU memory cache
  • Sets self._all_hooks with the offload hook chain
  • Sets self._offload_device and self._offload_gpu_id

enable_tiling

Inputs:

  • VAE instance (AutoencoderKLWan, AutoencoderKLHunyuanVideo, etc.)

Outputs:

  • Modified VAE with use_tiling = True and configured tile dimensions

Side Effects:

  • _decode and _encode methods will route to tiled_decode/tiled_encode when spatial dimensions exceed tile minimums

External Dependencies

  • accelerate >= 0.17.0 (for cpu_offload_with_hook)
  • CUDA-compatible GPU (CPU offloading requires a CUDA/MPS/XPU device)

Usage Examples

Minimal Memory Configuration for Wan 14B

import torch
from diffusers import AutoencoderKLWan, WanPipeline

model_id = "Wan-AI/Wan2.1-T2V-14B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)

# Enable memory optimizations
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

# Generate - only one component on GPU at a time
output = pipe(prompt="A sunset over mountains", num_frames=81, height=720, width=1280)

Custom Tile Sizes for HunyuanVideo

import torch
from diffusers import HunyuanVideoPipeline

pipe = HunyuanVideoPipeline.from_pretrained("hunyuanvideo-community/HunyuanVideo", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()

# Larger tiles = fewer seams but more memory per tile
pipe.vae.enable_tiling(
    tile_sample_min_height=512,
    tile_sample_min_width=512,
    tile_sample_stride_height=384,
    tile_sample_stride_width=384,
)

Combining Tiling and Slicing for Batch Processing

pipe.vae.enable_tiling()
pipe.vae.enable_slicing()  # Process batch elements one at a time

Related Pages

Principle:Huggingface_Diffusers_Video_Memory_Management

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment