Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Zai org CogVideo DDIM Attention Injection Reconstruction

From Leeroopedia


Attribute Value
Implementation Name DDIM Attention Injection Reconstruction
Workflow Video Editing DDIM Inversion
Step 5 of 6
Type API Doc
Source File inference/ddim_inversion.py:L118-243, inference/ddim_inversion.py:L246-260, inference/ddim_inversion.py:L499-509
Repository zai-org/CogVideo
External Dependencies diffusers (CogVideoXAttnProcessor2_0, CogVideoXBlock, CogVideoXTransformer3DModel), torch.nn.functional
Last Updated 2026-02-10 00:00 GMT

Overview

Implementation of the prompted reconstruction step in the DDIM inversion video editing pipeline. This includes the custom attention processor (CogVideoXAttnProcessor2_0ForDDIMInversion), the context manager for attention processor replacement (OverrideAttnProcessors), and the reconstruction call that combines these components.

Description

Three components work together for prompted reconstruction:

  1. CogVideoXAttnProcessor2_0ForDDIMInversion (L118-243): Extends the standard CogVideoXAttnProcessor2_0 to inject reference attention features from the source video's inversion trajectory. During each attention computation, it blends reference keys/values with current keys/values.
  2. OverrideAttnProcessors (L246-260): A Python context manager that temporarily replaces all attention processors in the transformer with the DDIM inversion variant. On entry, it swaps processors; on exit, it restores the originals.
  3. Reconstruction call (L499-509): Uses the context manager and calls the sample function with the forward scheduler, edit prompt, and reversed inversion trajectory as reference latents.

Usage

from inference.ddim_inversion import OverrideAttnProcessors, sample

with OverrideAttnProcessors(pipe.transformer):
    reconstruction_trajectory = sample(
        pipeline=pipe,
        latents=torch.randn_like(latents),
        scheduler=pipe.scheduler,
        prompt=edit_prompt,
        reference_latents=reversed_inversion_trajectory,
    )

Code Reference

Source Location

File Lines Description
inference/ddim_inversion.py L118-243 CogVideoXAttnProcessor2_0ForDDIMInversion class
inference/ddim_inversion.py L246-260 OverrideAttnProcessors context manager
inference/ddim_inversion.py L499-509 Reconstruction call site

Signature

class CogVideoXAttnProcessor2_0ForDDIMInversion(CogVideoXAttnProcessor2_0):
    """Custom attention processor that injects reference attention features."""

class OverrideAttnProcessors:
    """Context manager to temporarily replace attention processors."""
    def __init__(self, transformer: CogVideoXTransformer3DModel): ...

# Usage:
with OverrideAttnProcessors(pipe.transformer):
    reconstruction_trajectory = sample(
        pipeline=pipe,
        latents=torch.randn_like(latents),
        scheduler=pipe.scheduler,  # CogVideoXDDIMScheduler
        prompt=edit_prompt,
        reference_latents=reversed_inversion_trajectory,
    )

Import

from inference.ddim_inversion import (
    CogVideoXAttnProcessor2_0ForDDIMInversion,
    OverrideAttnProcessors,
    sample,
)

I/O Contract

Inputs

CogVideoXAttnProcessor2_0ForDDIMInversion

Parameter Type Default Description
Inherits from CogVideoXAttnProcessor2_0 -- -- All standard attention processor inputs (hidden_states, encoder_hidden_states, attention_mask, etc.)
Reference features torch.FloatTensor Via reference_latents Attention keys/values from the inversion trajectory at the current timestep

OverrideAttnProcessors

Parameter Type Default Description
transformer CogVideoXTransformer3DModel Required The pipeline's transformer model whose attention processors will be replaced

Reconstruction call

Parameter Type Default Description
pipeline CogVideoXPipeline Required Loaded CogVideoX pipeline
latents torch.FloatTensor Required Random noise tensor (same shape as encoded video latents)
scheduler CogVideoXDDIMScheduler Required Forward DDIM scheduler
prompt str Required Edit prompt describing the desired output
reference_latents torch.FloatTensor Required Reversed inversion trajectory of shape [num_steps, B, T, C, H', W']

Outputs

Output Type Description
reconstruction_trajectory torch.FloatTensor Reconstruction trajectory of shape [num_steps, B, T, C, H', W']; final step contains the edited video latents

Usage Examples

Example 1: Full video editing pipeline

from diffusers import CogVideoXPipeline, CogVideoXDDIMScheduler, DDIMInverseScheduler
from inference.ddim_inversion import (
    get_video_frames, encode_video_frames, sample,
    OverrideAttnProcessors, export_latents_to_video,
)
import torch

# Load pipeline
pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16
).to("cuda")

# Prepare schedulers
inverse_scheduler = DDIMInverseScheduler.from_config(pipe.scheduler.config)
forward_scheduler = CogVideoXDDIMScheduler.from_config(pipe.scheduler.config)

# Load and encode video
video_frames = get_video_frames("input.mp4")
latents = encode_video_frames(pipe.vae, video_frames)

# Step 1: Inversion
inversion_trajectory = sample(
    pipe, latents, inverse_scheduler, prompt="", num_inference_steps=50
)

# Step 2: Reconstruction with edit
reversed_trajectory = inversion_trajectory.flip(0)

with OverrideAttnProcessors(pipe.transformer):
    reconstruction = sample(
        pipe,
        torch.randn_like(latents),
        forward_scheduler,
        prompt="A dog playing in snow",
        reference_latents=reversed_trajectory,
        num_inference_steps=50,
    )

# Export edited video
export_latents_to_video(pipe, reconstruction[-1], "edited_output.mp4")

Example 2: Using the context manager pattern

# The OverrideAttnProcessors context manager ensures
# original processors are restored after reconstruction
with OverrideAttnProcessors(pipe.transformer):
    # Inside: attention processors are replaced with DDIM inversion variants
    result = sample(pipe, noise, forward_scheduler, "new prompt",
                    reference_latents=ref)
# Outside: original processors are restored

# Pipeline can be used normally for other tasks

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment