Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Zai org CogVideo Get Video Frames

From Leeroopedia


Attribute Value
Implementation Name Get Video Frames
Workflow Video Editing DDIM Inversion
Step 1 of 6
Type API Doc
Source File inference/ddim_inversion.py:L263-300
Repository zai-org/CogVideo
External Dependencies decord, torchvision.transforms
Last Updated 2026-02-10 00:00 GMT

Overview

Implementation of video loading, frame sampling, resizing, and normalization for the DDIM inversion pipeline. The get_video_frames function produces a tensor of preprocessed frames ready for VAE encoding.

Description

The get_video_frames function performs:

  1. Loads the video file using decord's VideoReader
  2. Applies start/end frame skipping
  3. Samples frames to the target count using uniform stepping or automatic stride calculation
  4. Resizes frames to the target resolution using torchvision transforms
  5. Normalizes pixel values from [0, 255] to [-1, 1]

The function enforces the VAE constraint that the frame count must satisfy (F mod 4) == 1.

Usage

from inference.ddim_inversion import get_video_frames

video_frames = get_video_frames(
    video_path="input_video.mp4",
    width=720,
    height=480,
    max_num_frames=81,
)
# video_frames shape: [F, C, H, W] in [-1, 1]

Code Reference

Source Location

File Lines Description
inference/ddim_inversion.py L263-300 get_video_frames function

Signature

def get_video_frames(
    video_path: str,
    width: int = 720,
    height: int = 480,
    skip_frames_start: int = 0,
    skip_frames_end: int = 0,
    max_num_frames: int = 81,
    frame_sample_step: Optional[int] = None,
) -> torch.FloatTensor:  # [F, C, H, W] in [-1, 1]

Import

from inference.ddim_inversion import get_video_frames

I/O Contract

Inputs

Parameter Type Default Description
video_path str Required Path to the input video file
width int 720 Target width for resizing
height int 480 Target height for resizing
skip_frames_start int 0 Number of frames to skip at the beginning
skip_frames_end int 0 Number of frames to skip at the end
max_num_frames int 81 Maximum number of frames to sample (must satisfy F mod 4 == 1)
frame_sample_step Optional[int] None Explicit frame sampling step; if None, computed automatically

Outputs

Output Type Description
Return value torch.FloatTensor Video frames tensor of shape [F, C, H, W] with values in [-1, 1]

Usage Examples

Example 1: Default loading

from inference.ddim_inversion import get_video_frames

frames = get_video_frames("input.mp4")
# frames.shape: [81, 3, 480, 720]
# frames.dtype: torch.float32
# frames.min() >= -1.0, frames.max() <= 1.0

Example 2: Custom resolution and frame count

frames = get_video_frames(
    "input.mp4",
    width=1360,
    height=768,
    max_num_frames=49,
    skip_frames_start=10,
    skip_frames_end=5,
)
# frames.shape: [49, 3, 768, 1360]

Example 3: Explicit frame sampling step

frames = get_video_frames(
    "input.mp4",
    frame_sample_step=3,  # Take every 3rd frame
    max_num_frames=25,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment