Implementation:Zai org CogVideo CogVideoXPipeline From Pretrained
Appearance
Overview
Concrete tool for loading CogVideoX text-to-video pipeline from pretrained weights provided by the diffusers library. This is the entry point for all CogVideoX text-to-video inference workflows.
Source
inference/cli_demo.py:L122
Signature
pipe = CogVideoXPipeline.from_pretrained(
model_path: str, # HF model ID e.g. "THUDM/CogVideoX1.5-5B"
torch_dtype: torch.dtype = torch.bfloat16
) -> CogVideoXPipeline
Supported Models
| Model ID | Parameters | Default Resolution | Recommended dtype |
|---|---|---|---|
| THUDM/CogVideoX-2b | 2B | 480 x 720 | torch.float16 |
| THUDM/CogVideoX-5b | 5B | 480 x 720 | torch.bfloat16 |
| THUDM/CogVideoX1.5-5B | 5B | 768 x 1360 | torch.bfloat16 |
Resolution Map
# cogvideox-2b / 5b -> 480 x 720
# cogvideox1.5-5b -> 768 x 1360
Key Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| model_path | str | (required) | HuggingFace model ID or local path to pretrained checkpoint |
| torch_dtype | torch.dtype | torch.bfloat16 | Data type for model weights. Use torch.bfloat16 for 5B models, torch.float16 for 2B
|
Inputs
- Model identifier string -- A HuggingFace model ID (e.g.,
"THUDM/CogVideoX1.5-5B") or local filesystem path pointing to a pretrained checkpoint directory.
Outputs
- CogVideoXPipeline instance -- A fully initialized pipeline object with the following sub-components loaded:
tokenizer-- T5 tokenizertext_encoder-- T5-XXL text encodertransformer-- CogVideoX 3D transformer denoiservae-- CogVideoX VAE encoder/decoderscheduler-- Default noise scheduler
Usage Example
import torch
from diffusers import CogVideoXPipeline
# Load the pipeline from pretrained weights
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX1.5-5B",
torch_dtype=torch.bfloat16
)
Import
from diffusers import CogVideoXPipeline
External Documentation
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment