Heuristic:Zai org CogVideo Scheduler and Guidance Selection
| Knowledge Sources | |
|---|---|
| Domains | Inference, Diffusion_Sampling, Video_Generation |
| Last Updated | 2026-02-10 02:00 GMT |
Overview
Scheduler selection: DDIM for CogVideoX-2B, DPM for CogVideoX-5B. Dynamic CFG should only be used with DPM scheduler. Default guidance scale is 6.0 with 50 inference steps.
Description
CogVideoX models are optimized for different diffusion schedulers depending on model size. The 2B model works best with CogVideoXDDIMScheduler, while 5B models use CogVideoXDPMScheduler. The `use_dynamic_cfg` flag, which adjusts guidance strength during sampling, is only compatible with the DPM scheduler and produces incorrect results with DDIM. These are not interchangeable choices — using the wrong scheduler/CFG combination degrades output quality.
Usage
Apply this heuristic when configuring the inference pipeline to select the correct scheduler, guidance scale, and dynamic CFG settings based on the model being used.
The Insight (Rule of Thumb)
- CogVideoX-2B: Use `CogVideoXDDIMScheduler`. Set `use_dynamic_cfg=False`.
- CogVideoX-5B / CogVideoX-5B-I2V: Use `CogVideoXDPMScheduler`. Set `use_dynamic_cfg=True`.
- CogVideoX1.5-5B: Use `CogVideoXDPMScheduler`. Set `use_dynamic_cfg=True`.
- Guidance scale: Default 6.0 (CLI), 7.0 (Gradio demo, intentionally fixed).
- Inference steps: Default 50. "50 steps are recommended for most cases."
- Long prompts: Essential for quality. Use an LLM to expand short prompts to match the model's training distribution.
- Trade-off: More inference steps produce marginal quality improvement at linear cost increase. 50 is the sweet spot.
Reasoning
From `inference/cli_demo.py:136-137`:
# We recommend using `CogVideoXDDIMScheduler` for CogVideoX-2B.
# using `CogVideoXDPMScheduler` for CogVideoX-5B / CogVideoX-5B-I2V.
Dynamic CFG compatibility from `inference/cli_demo.py:166`:
use_dynamic_cfg=True, # This id used for DPM scheduler, for DDIM scheduler, it should be False
Fixed guidance scale in Gradio demo from `inference/gradio_composite_demo/app.py:470`:
guidance_scale=7.0, # NOT Changed
Prompt importance from `README.md:91-92`:
"This is crucial because the model is trained with long prompts, and a good prompt directly impacts the quality of the video generation."