Implementation:Zai org CogVideo DDIM CogVideoXPipeline From Pretrained

Attribute	Value
Implementation Name	DDIM CogVideoXPipeline From Pretrained
Workflow	Video Editing DDIM Inversion
Step	2 of 6
Type	Wrapper Doc
Source File	`inference/ddim_inversion.py:L474-478`
Repository	zai-org/CogVideo
External Dependencies	diffusers (CogVideoXPipeline)
Last Updated	2026-02-10 00:00 GMT

Overview

Implementation of the CogVideoX pipeline loading for DDIM inversion. The pipeline is loaded from a pretrained CogVideoX-5B model path with bfloat16 precision and moved directly to CUDA.

Description

The pipeline loading is a straightforward call to CogVideoXPipeline.from_pretrained with two key constraints:

Model variant: Must use a CogVideoX-5B variant (requires rotary positional embeddings). The 2B variant is not supported.
No CPU offloading: The pipeline is loaded directly to GPU via .to(device="cuda") rather than using enable_model_cpu_offload(), since both forward and inverse passes are needed in the same session.

The loaded pipeline provides access to all components needed for DDIM inversion: pipe.vae, pipe.transformer, pipe.text_encoder, pipe.tokenizer, and pipe.scheduler.

Usage

from diffusers import CogVideoXPipeline
import torch

pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    torch_dtype=torch.bfloat16
).to(device="cuda")

Code Reference

Source Location

File	Lines	Description
`inference/ddim_inversion.py`	L474-478	Pipeline loading

Signature

pipe = CogVideoXPipeline.from_pretrained(
    model_path: str,  # Must be CogVideoX-5B variant
    torch_dtype: torch.dtype = torch.bfloat16
).to(device="cuda")

Import

from diffusers import CogVideoXPipeline

I/O Contract

Inputs

Parameter	Type	Default	Description
`model_path`	`str`	Required	Path or HuggingFace model ID for a CogVideoX-5B variant (e.g., `"THUDM/CogVideoX-5b"`)
`torch_dtype`	`torch.dtype`	`torch.bfloat16`	Model precision; bfloat16 recommended for memory efficiency

Outputs

Output	Type	Description
`pipe`	`CogVideoXPipeline`	Loaded pipeline on CUDA with VAE, transformer, text encoder, tokenizer, and scheduler

Usage Examples

Example 1: Load from HuggingFace Hub

from diffusers import CogVideoXPipeline
import torch

pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    torch_dtype=torch.bfloat16
).to(device="cuda")

# Access pipeline components
vae = pipe.vae
transformer = pipe.transformer
text_encoder = pipe.text_encoder

Example 2: Load from local path

pipe = CogVideoXPipeline.from_pretrained(
    "/models/CogVideoX-5b",
    torch_dtype=torch.bfloat16
).to(device="cuda")

Example 3: Verify scheduler compatibility

from diffusers import CogVideoXDDIMScheduler, DDIMInverseScheduler

# The pipeline's default scheduler can be replaced for inversion
inverse_scheduler = DDIMInverseScheduler.from_config(pipe.scheduler.config)
forward_scheduler = CogVideoXDDIMScheduler.from_config(pipe.scheduler.config)

Related Pages

Principle:Zai_org_CogVideo_DDIM_Pipeline_Loading -- Principle governing DDIM pipeline loading
Environment:Zai_org_CogVideo_Diffusers_Inference_Environment
Zai_org_CogVideo_Get_Video_Frames -- Previous step: video frame loading and preprocessing
Zai_org_CogVideo_Encode_Video_Frames -- Next step: encoding frames using the pipeline's VAE
Zai_org_CogVideo_DDIM_Inversion_Sample -- Inversion step using the loaded pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment