Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Zai org CogVideo DDIM CogVideoXPipeline From Pretrained

From Leeroopedia


Attribute Value
Implementation Name DDIM CogVideoXPipeline From Pretrained
Workflow Video Editing DDIM Inversion
Step 2 of 6
Type Wrapper Doc
Source File inference/ddim_inversion.py:L474-478
Repository zai-org/CogVideo
External Dependencies diffusers (CogVideoXPipeline)
Last Updated 2026-02-10 00:00 GMT

Overview

Implementation of the CogVideoX pipeline loading for DDIM inversion. The pipeline is loaded from a pretrained CogVideoX-5B model path with bfloat16 precision and moved directly to CUDA.

Description

The pipeline loading is a straightforward call to CogVideoXPipeline.from_pretrained with two key constraints:

  1. Model variant: Must use a CogVideoX-5B variant (requires rotary positional embeddings). The 2B variant is not supported.
  2. No CPU offloading: The pipeline is loaded directly to GPU via .to(device="cuda") rather than using enable_model_cpu_offload(), since both forward and inverse passes are needed in the same session.

The loaded pipeline provides access to all components needed for DDIM inversion: pipe.vae, pipe.transformer, pipe.text_encoder, pipe.tokenizer, and pipe.scheduler.

Usage

from diffusers import CogVideoXPipeline
import torch

pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    torch_dtype=torch.bfloat16
).to(device="cuda")

Code Reference

Source Location

File Lines Description
inference/ddim_inversion.py L474-478 Pipeline loading

Signature

pipe = CogVideoXPipeline.from_pretrained(
    model_path: str,  # Must be CogVideoX-5B variant
    torch_dtype: torch.dtype = torch.bfloat16
).to(device="cuda")

Import

from diffusers import CogVideoXPipeline

I/O Contract

Inputs

Parameter Type Default Description
model_path str Required Path or HuggingFace model ID for a CogVideoX-5B variant (e.g., "THUDM/CogVideoX-5b")
torch_dtype torch.dtype torch.bfloat16 Model precision; bfloat16 recommended for memory efficiency

Outputs

Output Type Description
pipe CogVideoXPipeline Loaded pipeline on CUDA with VAE, transformer, text encoder, tokenizer, and scheduler

Usage Examples

Example 1: Load from HuggingFace Hub

from diffusers import CogVideoXPipeline
import torch

pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    torch_dtype=torch.bfloat16
).to(device="cuda")

# Access pipeline components
vae = pipe.vae
transformer = pipe.transformer
text_encoder = pipe.text_encoder

Example 2: Load from local path

pipe = CogVideoXPipeline.from_pretrained(
    "/models/CogVideoX-5b",
    torch_dtype=torch.bfloat16
).to(device="cuda")

Example 3: Verify scheduler compatibility

from diffusers import CogVideoXDDIMScheduler, DDIMInverseScheduler

# The pipeline's default scheduler can be replaced for inversion
inverse_scheduler = DDIMInverseScheduler.from_config(pipe.scheduler.config)
forward_scheduler = CogVideoXDDIMScheduler.from_config(pipe.scheduler.config)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment