Implementation:Zai org CogVideo Get Video Frames
Appearance
| Attribute | Value |
|---|---|
| Implementation Name | Get Video Frames |
| Workflow | Video Editing DDIM Inversion |
| Step | 1 of 6 |
| Type | API Doc |
| Source File | inference/ddim_inversion.py:L263-300
|
| Repository | zai-org/CogVideo |
| External Dependencies | decord, torchvision.transforms |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Implementation of video loading, frame sampling, resizing, and normalization for the DDIM inversion pipeline. The get_video_frames function produces a tensor of preprocessed frames ready for VAE encoding.
Description
The get_video_frames function performs:
- Loads the video file using decord's
VideoReader - Applies start/end frame skipping
- Samples frames to the target count using uniform stepping or automatic stride calculation
- Resizes frames to the target resolution using torchvision transforms
- Normalizes pixel values from
[0, 255]to[-1, 1]
The function enforces the VAE constraint that the frame count must satisfy (F mod 4) == 1.
Usage
from inference.ddim_inversion import get_video_frames
video_frames = get_video_frames(
video_path="input_video.mp4",
width=720,
height=480,
max_num_frames=81,
)
# video_frames shape: [F, C, H, W] in [-1, 1]
Code Reference
Source Location
| File | Lines | Description |
|---|---|---|
inference/ddim_inversion.py |
L263-300 | get_video_frames function
|
Signature
def get_video_frames(
video_path: str,
width: int = 720,
height: int = 480,
skip_frames_start: int = 0,
skip_frames_end: int = 0,
max_num_frames: int = 81,
frame_sample_step: Optional[int] = None,
) -> torch.FloatTensor: # [F, C, H, W] in [-1, 1]
Import
from inference.ddim_inversion import get_video_frames
I/O Contract
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
video_path |
str |
Required | Path to the input video file |
width |
int |
720 |
Target width for resizing |
height |
int |
480 |
Target height for resizing |
skip_frames_start |
int |
0 |
Number of frames to skip at the beginning |
skip_frames_end |
int |
0 |
Number of frames to skip at the end |
max_num_frames |
int |
81 |
Maximum number of frames to sample (must satisfy F mod 4 == 1)
|
frame_sample_step |
Optional[int] |
None |
Explicit frame sampling step; if None, computed automatically |
Outputs
| Output | Type | Description |
|---|---|---|
| Return value | torch.FloatTensor |
Video frames tensor of shape [F, C, H, W] with values in [-1, 1]
|
Usage Examples
Example 1: Default loading
from inference.ddim_inversion import get_video_frames
frames = get_video_frames("input.mp4")
# frames.shape: [81, 3, 480, 720]
# frames.dtype: torch.float32
# frames.min() >= -1.0, frames.max() <= 1.0
Example 2: Custom resolution and frame count
frames = get_video_frames(
"input.mp4",
width=1360,
height=768,
max_num_frames=49,
skip_frames_start=10,
skip_frames_end=5,
)
# frames.shape: [49, 3, 768, 1360]
Example 3: Explicit frame sampling step
frames = get_video_frames(
"input.mp4",
frame_sample_step=3, # Take every 3rd frame
max_num_frames=25,
)
Related Pages
- Principle:Zai_org_CogVideo_Video_Loading_and_Preprocessing -- Principle governing video loading and preprocessing
- Environment:Zai_org_CogVideo_Diffusers_Inference_Environment
- Zai_org_CogVideo_Encode_Video_Frames -- Next step: encoding frames to latent space
- Zai_org_CogVideo_DDIM_CogVideoXPipeline_From_Pretrained -- Pipeline providing the VAE for subsequent encoding
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment