Heuristic:Zai org CogVideo Scheduler and Guidance Selection

Knowledge Sources	CogVideo Inference optimization
Domains	Inference, Diffusion_Sampling, Video_Generation
Last Updated	2026-02-10 02:00 GMT

Overview

Scheduler selection: DDIM for CogVideoX-2B, DPM for CogVideoX-5B. Dynamic CFG should only be used with DPM scheduler. Default guidance scale is 6.0 with 50 inference steps.

Description

CogVideoX models are optimized for different diffusion schedulers depending on model size. The 2B model works best with CogVideoXDDIMScheduler, while 5B models use CogVideoXDPMScheduler. The `use_dynamic_cfg` flag, which adjusts guidance strength during sampling, is only compatible with the DPM scheduler and produces incorrect results with DDIM. These are not interchangeable choices — using the wrong scheduler/CFG combination degrades output quality.

Usage

Apply this heuristic when configuring the inference pipeline to select the correct scheduler, guidance scale, and dynamic CFG settings based on the model being used.

The Insight (Rule of Thumb)

CogVideoX-2B: Use `CogVideoXDDIMScheduler`. Set `use_dynamic_cfg=False`.
CogVideoX-5B / CogVideoX-5B-I2V: Use `CogVideoXDPMScheduler`. Set `use_dynamic_cfg=True`.
CogVideoX1.5-5B: Use `CogVideoXDPMScheduler`. Set `use_dynamic_cfg=True`.
Guidance scale: Default 6.0 (CLI), 7.0 (Gradio demo, intentionally fixed).
Inference steps: Default 50. "50 steps are recommended for most cases."
Long prompts: Essential for quality. Use an LLM to expand short prompts to match the model's training distribution.
Trade-off: More inference steps produce marginal quality improvement at linear cost increase. 50 is the sweet spot.

Reasoning

From `inference/cli_demo.py:136-137`:

# We recommend using `CogVideoXDDIMScheduler` for CogVideoX-2B.
# using `CogVideoXDPMScheduler` for CogVideoX-5B / CogVideoX-5B-I2V.

Dynamic CFG compatibility from `inference/cli_demo.py:166`:

use_dynamic_cfg=True,  # This id used for DPM scheduler, for DDIM scheduler, it should be False

Fixed guidance scale in Gradio demo from `inference/gradio_composite_demo/app.py:470`:

guidance_scale=7.0,  # NOT Changed

Prompt importance from `README.md:91-92`:

"This is crucial because the model is trained with long prompts, and a good prompt directly impacts the quality of the video generation."

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment