Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Zai org CogVideo DDIM Pipeline Loading

From Leeroopedia


Attribute Value
Principle Name DDIM Pipeline Loading
Workflow Video Editing DDIM Inversion
Step 2 of 6
Type Model Initialization
Repository zai-org/CogVideo
Paper CogVideoX
Last Updated 2026-02-10 00:00 GMT

Overview

Technique for loading the CogVideoX pipeline with DDIM-specific schedulers for video inversion and editing. DDIM inversion requires both forward and inverse schedulers, and the CogVideoX-5B variant specifically (due to its rotary positional embedding support).

Description

DDIM inversion requires loading the CogVideoX pipeline with two schedulers:

  1. CogVideoXDDIMScheduler (forward): Used during reconstruction to denoise from inverted noise back to clean video. This scheduler implements the deterministic DDIM forward process.
  2. DDIMInverseScheduler (inverse): Used during inversion to map clean video latents to their noise-space representations. This scheduler reverses the forward DDIM steps.

The pipeline is loaded to GPU directly (no CPU offloading) since both forward and inverse passes are needed in the same session, and CPU offloading would introduce excessive data transfer overhead.

Important constraint: Only the CogVideoX-5B variant is supported for DDIM inversion because it uses rotary positional embeddings, which are required for the inversion process to produce faithful reconstructions. The 2B variant does not support this.

Usage

Use DDIM Pipeline Loading at the beginning of the video editing workflow, before video encoding and inversion. The loaded pipeline provides the VAE (for encoding/decoding), transformer (for denoising), text encoder (for prompt conditioning), and schedulers (for forward/inverse DDIM).

Theoretical Basis

DDIM inversion requires a deterministic (non-stochastic) scheduler to ensure invertibility. The standard DDPM scheduler introduces random noise at each step, making the process non-invertible. The DDIM scheduler removes this stochasticity by using a deterministic mapping:

Forward DDIM step:

x_{t-1} = sqrt(alpha_{t-1}) * x_0_pred + sqrt(1 - alpha_{t-1}) * epsilon_pred

Inverse DDIM step (reversing the above):

x_{t+1} = sqrt(alpha_{t+1}) * x_0_pred + sqrt(1 - alpha_{t+1}) * epsilon_pred

The DDIMInverseScheduler reverses the forward DDIM process, mapping clean latents to their noise-space representations. The deterministic nature of DDIM ensures that forward(inverse(x)) approximately equals x, enabling faithful reconstruction.

Rotary positional embeddings (RoPE) in the 5B model provide position-dependent attention that is critical for maintaining temporal coherence during the inversion-reconstruction cycle.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment