Implementation:Zai org CogVideo Pipeline CPU Offload

Overview

Concrete tool for enabling memory-efficient video generation through CPU offloading and VAE optimization provided by the diffusers library. These methods allow CogVideoX models to run on consumer GPUs with limited VRAM.

Source

inference/cli_demo.py:L150-152

Signature

# Option A: Lower VRAM, slower
pipe.enable_sequential_cpu_offload()
# Option B: Higher VRAM, faster
pipe.enable_model_cpu_offload()

# Always enable for video:
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

Key Parameters

Method	Description	VRAM Impact
enable_sequential_cpu_offload()	Moves each sub-model to GPU only during forward pass	Lowest VRAM usage
enable_model_cpu_offload()	Keeps the active model on GPU, offloads others	Moderate VRAM savings
vae.enable_slicing()	Processes video frames one slice at a time in VAE	Reduces VAE peak memory
vae.enable_tiling()	Processes spatial dimensions in tiles in VAE	Reduces VAE spatial memory

Note: Use either enable_sequential_cpu_offload() or enable_model_cpu_offload(), not both. Always enable both VAE slicing and tiling.

Inputs

Loaded pipeline -- A CogVideoXPipeline instance that has been loaded from pretrained weights and configured with a scheduler.

Outputs

Memory-optimized pipeline -- The same pipeline instance with memory optimization hooks enabled. No change to the pipeline API; generation calls work identically.

Usage Example

from diffusers import CogVideoXPipeline, CogVideoXDPMScheduler
import torch

# Load pipeline
pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    torch_dtype=torch.bfloat16
)

# Configure scheduler
pipe.scheduler = CogVideoXDPMScheduler.from_config(
    pipe.scheduler.config,
    timestep_spacing="trailing"
)

# Enable memory optimizations
pipe.enable_sequential_cpu_offload()  # or pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

Import

Methods are available directly on the pipeline and VAE objects. No additional imports are needed beyond the pipeline itself:

from diffusers import CogVideoXPipeline
# enable_sequential_cpu_offload, enable_model_cpu_offload are pipeline methods
# enable_slicing, enable_tiling are VAE methods

External Dependencies

diffusers -- Provides the pipeline and memory optimization methods
accelerate -- Required for CPU offloading functionality (used internally by diffusers)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment