Principle:Zai org CogVideo Scheduler Configuration

Overview

Technique for selecting and configuring the noise scheduler that controls the diffusion sampling process during video generation.

Description

The scheduler defines the noise schedule and step function used during the denoising (sampling) process. CogVideoX supports DPM and DDIM schedulers. DPM (Diffusion Probabilistic Model) scheduler is recommended for 5B models for better quality, while DDIM is recommended for 2B models. The "trailing" timestep spacing strategy is used for proper alignment with the training noise schedule.

The scheduler is responsible for:

Defining the noise schedule -- The sequence of noise levels from pure noise to clean signal
Computing the step function -- How to update latents at each denoising step given the model prediction
Managing timesteps -- Selecting which timesteps to use during the sampling process

Usage

Use after loading the pipeline and before generating videos. The scheduler choice depends on the model variant:

Model Variant	Recommended Scheduler	Reasoning
CogVideoX-5b / CogVideoX1.5-5B	CogVideoXDPMScheduler	Better quality with higher-order ODE solver
CogVideoX-2b	CogVideoXDDIMScheduler	Better compatibility with 2B model training

Both schedulers should use timestep_spacing="trailing" for proper alignment with the training noise schedule.

Theoretical Basis

DPM-Solver

DPM-Solver uses higher-order ODE solvers for faster convergence with fewer steps. It formulates the reverse diffusion process as solving an ordinary differential equation (ODE) and applies multi-step methods to achieve higher accuracy per step compared to first-order methods.

DDIM

DDIM (Denoising Diffusion Implicit Models) uses a deterministic reverse process. Given the noise prediction at each step, DDIM computes the denoised sample using a non-Markovian update rule that allows skipping steps while maintaining sample quality.

Trailing Timestep Spacing

Trailing timestep spacing aligns the inference schedule with training by placing the final timestep at the end of the noise range. This ensures that the last denoising step produces a fully denoised sample, which is critical for generation quality when using fewer inference steps than training steps.

Knowledge Sources

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment