Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Zai org CogVideo Pipeline CPU Offload

From Leeroopedia


Template:Implementation

Overview

Concrete tool for enabling memory-efficient video generation through CPU offloading and VAE optimization provided by the diffusers library. These methods allow CogVideoX models to run on consumer GPUs with limited VRAM.

Source

inference/cli_demo.py:L150-152

Signature

# Option A: Lower VRAM, slower
pipe.enable_sequential_cpu_offload()
# Option B: Higher VRAM, faster
pipe.enable_model_cpu_offload()

# Always enable for video:
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

Key Parameters

Method Description VRAM Impact
enable_sequential_cpu_offload() Moves each sub-model to GPU only during forward pass Lowest VRAM usage
enable_model_cpu_offload() Keeps the active model on GPU, offloads others Moderate VRAM savings
vae.enable_slicing() Processes video frames one slice at a time in VAE Reduces VAE peak memory
vae.enable_tiling() Processes spatial dimensions in tiles in VAE Reduces VAE spatial memory

Note: Use either enable_sequential_cpu_offload() or enable_model_cpu_offload(), not both. Always enable both VAE slicing and tiling.

Inputs

  • Loaded pipeline -- A CogVideoXPipeline instance that has been loaded from pretrained weights and configured with a scheduler.

Outputs

  • Memory-optimized pipeline -- The same pipeline instance with memory optimization hooks enabled. No change to the pipeline API; generation calls work identically.

Usage Example

from diffusers import CogVideoXPipeline, CogVideoXDPMScheduler
import torch

# Load pipeline
pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    torch_dtype=torch.bfloat16
)

# Configure scheduler
pipe.scheduler = CogVideoXDPMScheduler.from_config(
    pipe.scheduler.config,
    timestep_spacing="trailing"
)

# Enable memory optimizations
pipe.enable_sequential_cpu_offload()  # or pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

Import

Methods are available directly on the pipeline and VAE objects. No additional imports are needed beyond the pipeline itself:

from diffusers import CogVideoXPipeline
# enable_sequential_cpu_offload, enable_model_cpu_offload are pipeline methods
# enable_slicing, enable_tiling are VAE methods

External Dependencies

  • diffusers -- Provides the pipeline and memory optimization methods
  • accelerate -- Required for CPU offloading functionality (used internally by diffusers)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment