Implementation:Zai org CogVideo DDIM CogVideoXPipeline From Pretrained
Appearance
| Attribute | Value |
|---|---|
| Implementation Name | DDIM CogVideoXPipeline From Pretrained |
| Workflow | Video Editing DDIM Inversion |
| Step | 2 of 6 |
| Type | Wrapper Doc |
| Source File | inference/ddim_inversion.py:L474-478
|
| Repository | zai-org/CogVideo |
| External Dependencies | diffusers (CogVideoXPipeline) |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Implementation of the CogVideoX pipeline loading for DDIM inversion. The pipeline is loaded from a pretrained CogVideoX-5B model path with bfloat16 precision and moved directly to CUDA.
Description
The pipeline loading is a straightforward call to CogVideoXPipeline.from_pretrained with two key constraints:
- Model variant: Must use a CogVideoX-5B variant (requires rotary positional embeddings). The 2B variant is not supported.
- No CPU offloading: The pipeline is loaded directly to GPU via
.to(device="cuda")rather than usingenable_model_cpu_offload(), since both forward and inverse passes are needed in the same session.
The loaded pipeline provides access to all components needed for DDIM inversion: pipe.vae, pipe.transformer, pipe.text_encoder, pipe.tokenizer, and pipe.scheduler.
Usage
from diffusers import CogVideoXPipeline
import torch
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-5b",
torch_dtype=torch.bfloat16
).to(device="cuda")
Code Reference
Source Location
| File | Lines | Description |
|---|---|---|
inference/ddim_inversion.py |
L474-478 | Pipeline loading |
Signature
pipe = CogVideoXPipeline.from_pretrained(
model_path: str, # Must be CogVideoX-5B variant
torch_dtype: torch.dtype = torch.bfloat16
).to(device="cuda")
Import
from diffusers import CogVideoXPipeline
I/O Contract
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
model_path |
str |
Required | Path or HuggingFace model ID for a CogVideoX-5B variant (e.g., "THUDM/CogVideoX-5b")
|
torch_dtype |
torch.dtype |
torch.bfloat16 |
Model precision; bfloat16 recommended for memory efficiency |
Outputs
| Output | Type | Description |
|---|---|---|
pipe |
CogVideoXPipeline |
Loaded pipeline on CUDA with VAE, transformer, text encoder, tokenizer, and scheduler |
Usage Examples
Example 1: Load from HuggingFace Hub
from diffusers import CogVideoXPipeline
import torch
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-5b",
torch_dtype=torch.bfloat16
).to(device="cuda")
# Access pipeline components
vae = pipe.vae
transformer = pipe.transformer
text_encoder = pipe.text_encoder
Example 2: Load from local path
pipe = CogVideoXPipeline.from_pretrained(
"/models/CogVideoX-5b",
torch_dtype=torch.bfloat16
).to(device="cuda")
Example 3: Verify scheduler compatibility
from diffusers import CogVideoXDDIMScheduler, DDIMInverseScheduler
# The pipeline's default scheduler can be replaced for inversion
inverse_scheduler = DDIMInverseScheduler.from_config(pipe.scheduler.config)
forward_scheduler = CogVideoXDDIMScheduler.from_config(pipe.scheduler.config)
Related Pages
- Principle:Zai_org_CogVideo_DDIM_Pipeline_Loading -- Principle governing DDIM pipeline loading
- Environment:Zai_org_CogVideo_Diffusers_Inference_Environment
- Zai_org_CogVideo_Get_Video_Frames -- Previous step: video frame loading and preprocessing
- Zai_org_CogVideo_Encode_Video_Frames -- Next step: encoding frames using the pipeline's VAE
- Zai_org_CogVideo_DDIM_Inversion_Sample -- Inversion step using the loaded pipeline
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment