Principle:Zai org CogVideo DDIM Pipeline Loading

Attribute	Value
Principle Name	DDIM Pipeline Loading
Workflow	Video Editing DDIM Inversion
Step	2 of 6
Type	Model Initialization
Repository	zai-org/CogVideo
Paper	CogVideoX
Last Updated	2026-02-10 00:00 GMT

Overview

Technique for loading the CogVideoX pipeline with DDIM-specific schedulers for video inversion and editing. DDIM inversion requires both forward and inverse schedulers, and the CogVideoX-5B variant specifically (due to its rotary positional embedding support).

Description

DDIM inversion requires loading the CogVideoX pipeline with two schedulers:

CogVideoXDDIMScheduler (forward): Used during reconstruction to denoise from inverted noise back to clean video. This scheduler implements the deterministic DDIM forward process.
DDIMInverseScheduler (inverse): Used during inversion to map clean video latents to their noise-space representations. This scheduler reverses the forward DDIM steps.

The pipeline is loaded to GPU directly (no CPU offloading) since both forward and inverse passes are needed in the same session, and CPU offloading would introduce excessive data transfer overhead.

Important constraint: Only the CogVideoX-5B variant is supported for DDIM inversion because it uses rotary positional embeddings, which are required for the inversion process to produce faithful reconstructions. The 2B variant does not support this.

Usage

Use DDIM Pipeline Loading at the beginning of the video editing workflow, before video encoding and inversion. The loaded pipeline provides the VAE (for encoding/decoding), transformer (for denoising), text encoder (for prompt conditioning), and schedulers (for forward/inverse DDIM).

Theoretical Basis

DDIM inversion requires a deterministic (non-stochastic) scheduler to ensure invertibility. The standard DDPM scheduler introduces random noise at each step, making the process non-invertible. The DDIM scheduler removes this stochasticity by using a deterministic mapping:

Forward DDIM step:

x_{t-1} = sqrt(alpha_{t-1}) * x_0_pred + sqrt(1 - alpha_{t-1}) * epsilon_pred

Inverse DDIM step (reversing the above):

x_{t+1} = sqrt(alpha_{t+1}) * x_0_pred + sqrt(1 - alpha_{t+1}) * epsilon_pred

The DDIMInverseScheduler reverses the forward DDIM process, mapping clean latents to their noise-space representations. The deterministic nature of DDIM ensures that forward(inverse(x)) approximately equals x, enabling faithful reconstruction.

Rotary positional embeddings (RoPE) in the 5B model provide position-dependent attention that is critical for maintaining temporal coherence during the inversion-reconstruction cycle.

Related Pages

Implementation:Zai_org_CogVideo_DDIM_CogVideoXPipeline_From_Pretrained -- Implementation of pipeline loading
Zai_org_CogVideo_Video_Loading_and_Preprocessing -- Previous step: video preprocessing
Zai_org_CogVideo_Video_Encoding -- Next step: encoding video frames using the pipeline's VAE
Zai_org_CogVideo_DDIM_Inversion -- Inversion step that uses the inverse scheduler

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment