Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Zai org CogVideo DDIM Inversion Sample

From Leeroopedia


Attribute Value
Implementation Name DDIM Inversion Sample
Workflow Video Editing DDIM Inversion
Step 4 of 6
Type API Doc
Source File inference/ddim_inversion.py:L321-452, inference/ddim_inversion.py:L489-498
Repository zai-org/CogVideo
External Dependencies diffusers, torch
Last Updated 2026-02-10 00:00 GMT

Overview

Implementation of the DDIM inversion sampling function. The sample function serves dual purpose: it performs both DDIM inversion (when called with the inverse scheduler and empty prompt) and forward DDIM reconstruction (when called with the forward scheduler and edit prompt). The function stores the full latent trajectory for use in attention injection.

Description

The sample function implements the core DDIM loop:

  1. Sets up the scheduler timesteps for the specified number of inference steps
  2. Encodes the prompt (or empty string for inversion) using the pipeline's text encoder
  3. Iterates over timesteps, at each step:
    • Concatenates the latent with itself for classifier-free guidance (if guidance_scale > 1)
    • Runs the transformer forward pass to predict noise
    • Applies CFG to combine conditional and unconditional predictions
    • Steps the scheduler (forward for reconstruction, inverse for inversion)
    • Stores the latent in the trajectory
  4. Returns the complete trajectory tensor

For inversion specifically (lines L489-498): the function is called with DDIMInverseScheduler, an empty prompt, and reference_latents=None. The trajectory is then reversed and passed as reference_latents to the reconstruction call.

Usage

from inference.ddim_inversion import sample

# Inversion
inversion_trajectory = sample(
    pipeline=pipe,
    latents=encoded_latents,
    scheduler=inverse_scheduler,
    prompt="",
    num_inference_steps=50,
)

Code Reference

Source Location

File Lines Description
inference/ddim_inversion.py L321-452 sample function (main DDIM loop)
inference/ddim_inversion.py L489-498 Inversion call site

Signature

def sample(
    pipeline: CogVideoXPipeline,
    latents: torch.FloatTensor,
    scheduler: Union[DDIMInverseScheduler, CogVideoXDDIMScheduler],
    prompt: Optional[str] = None,
    num_inference_steps: int = 50,
    guidance_scale: float = 6.0,
    generator: Optional[torch.Generator] = None,
    reference_latents: torch.FloatTensor = None,
) -> torch.FloatTensor:  # trajectory [num_steps, B, T, C, H', W']

Import

from inference.ddim_inversion import sample

I/O Contract

Inputs

Parameter Type Default Description
pipeline CogVideoXPipeline Required Loaded CogVideoX pipeline with transformer, text encoder, and VAE
latents torch.FloatTensor Required Starting latents: encoded video for inversion, or random noise for reconstruction
scheduler Union[DDIMInverseScheduler, CogVideoXDDIMScheduler] Required Inverse scheduler for inversion, forward scheduler for reconstruction
prompt Optional[str] None Text prompt; empty string for inversion, edit prompt for reconstruction
num_inference_steps int 50 Number of DDIM steps
guidance_scale float 6.0 Classifier-free guidance scale
generator Optional[torch.Generator] None Random number generator for reproducibility
reference_latents torch.FloatTensor None Inversion trajectory for attention injection during reconstruction; None for inversion

Outputs

Output Type Description
Return value torch.FloatTensor Latent trajectory tensor of shape [num_steps, B, T, C, H', W'] containing latents at each timestep

Usage Examples

Example 1: DDIM inversion (finding noise representation)

from diffusers import DDIMInverseScheduler
from inference.ddim_inversion import sample, encode_video_frames, get_video_frames

# Prepare inverse scheduler
inverse_scheduler = DDIMInverseScheduler.from_config(pipe.scheduler.config)

# Load and encode video
video_frames = get_video_frames("input.mp4")
latents = encode_video_frames(pipe.vae, video_frames)

# Run inversion
inversion_trajectory = sample(
    pipeline=pipe,
    latents=latents,
    scheduler=inverse_scheduler,
    prompt="",  # Empty prompt for unconditional inversion
    num_inference_steps=50,
    guidance_scale=1.0,  # No CFG during inversion
)
# inversion_trajectory.shape: [50, 1, T, 16, H', W']

Example 2: Forward reconstruction (verifying inversion quality)

from diffusers import CogVideoXDDIMScheduler

forward_scheduler = CogVideoXDDIMScheduler.from_config(pipe.scheduler.config)

# Reverse the trajectory for reconstruction
reversed_trajectory = inversion_trajectory.flip(0)

reconstruction = sample(
    pipeline=pipe,
    latents=inversion_trajectory[-1],  # Start from noise
    scheduler=forward_scheduler,
    prompt="original prompt",
    num_inference_steps=50,
    reference_latents=reversed_trajectory,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment