Implementation:Zai org CogVideo CogVideoXPipeline From Pretrained

Overview

Concrete tool for loading CogVideoX text-to-video pipeline from pretrained weights provided by the diffusers library. This is the entry point for all CogVideoX text-to-video inference workflows.

Source

inference/cli_demo.py:L122

Signature

pipe = CogVideoXPipeline.from_pretrained(
    model_path: str,  # HF model ID e.g. "THUDM/CogVideoX1.5-5B"
    torch_dtype: torch.dtype = torch.bfloat16
) -> CogVideoXPipeline

Supported Models

Model ID	Parameters	Default Resolution	Recommended dtype
THUDM/CogVideoX-2b	2B	480 x 720	torch.float16
THUDM/CogVideoX-5b	5B	480 x 720	torch.bfloat16
THUDM/CogVideoX1.5-5B	5B	768 x 1360	torch.bfloat16

Resolution Map

# cogvideox-2b / 5b -> 480 x 720
# cogvideox1.5-5b   -> 768 x 1360

Key Parameters

Parameter	Type	Default	Description
model_path	str	(required)	HuggingFace model ID or local path to pretrained checkpoint
torch_dtype	torch.dtype	torch.bfloat16	Data type for model weights. Use `torch.bfloat16` for 5B models, `torch.float16` for 2B

Inputs

Model identifier string -- A HuggingFace model ID (e.g., "THUDM/CogVideoX1.5-5B") or local filesystem path pointing to a pretrained checkpoint directory.

Outputs

CogVideoXPipeline instance -- A fully initialized pipeline object with the following sub-components loaded:
- tokenizer -- T5 tokenizer
- text_encoder -- T5-XXL text encoder
- transformer -- CogVideoX 3D transformer denoiser
- vae -- CogVideoX VAE encoder/decoder
- scheduler -- Default noise scheduler

Usage Example

import torch
from diffusers import CogVideoXPipeline

# Load the pipeline from pretrained weights
pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX1.5-5B",
    torch_dtype=torch.bfloat16
)

Import

from diffusers import CogVideoXPipeline

External Documentation

Diffusers CogVideoX Pipeline Documentation

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment