Implementation:Zai org CogVideo Pipeline CPU Offload
Appearance
Overview
Concrete tool for enabling memory-efficient video generation through CPU offloading and VAE optimization provided by the diffusers library. These methods allow CogVideoX models to run on consumer GPUs with limited VRAM.
Source
inference/cli_demo.py:L150-152
Signature
# Option A: Lower VRAM, slower
pipe.enable_sequential_cpu_offload()
# Option B: Higher VRAM, faster
pipe.enable_model_cpu_offload()
# Always enable for video:
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
Key Parameters
| Method | Description | VRAM Impact |
|---|---|---|
| enable_sequential_cpu_offload() | Moves each sub-model to GPU only during forward pass | Lowest VRAM usage |
| enable_model_cpu_offload() | Keeps the active model on GPU, offloads others | Moderate VRAM savings |
| vae.enable_slicing() | Processes video frames one slice at a time in VAE | Reduces VAE peak memory |
| vae.enable_tiling() | Processes spatial dimensions in tiles in VAE | Reduces VAE spatial memory |
Note: Use either enable_sequential_cpu_offload() or enable_model_cpu_offload(), not both. Always enable both VAE slicing and tiling.
Inputs
- Loaded pipeline -- A
CogVideoXPipelineinstance that has been loaded from pretrained weights and configured with a scheduler.
Outputs
- Memory-optimized pipeline -- The same pipeline instance with memory optimization hooks enabled. No change to the pipeline API; generation calls work identically.
Usage Example
from diffusers import CogVideoXPipeline, CogVideoXDPMScheduler
import torch
# Load pipeline
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-5b",
torch_dtype=torch.bfloat16
)
# Configure scheduler
pipe.scheduler = CogVideoXDPMScheduler.from_config(
pipe.scheduler.config,
timestep_spacing="trailing"
)
# Enable memory optimizations
pipe.enable_sequential_cpu_offload() # or pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
Import
Methods are available directly on the pipeline and VAE objects. No additional imports are needed beyond the pipeline itself:
from diffusers import CogVideoXPipeline
# enable_sequential_cpu_offload, enable_model_cpu_offload are pipeline methods
# enable_slicing, enable_tiling are VAE methods
External Dependencies
- diffusers -- Provides the pipeline and memory optimization methods
- accelerate -- Required for CPU offloading functionality (used internally by diffusers)
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment