Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Zai org CogVideo DDIM Video Export

From Leeroopedia


Attribute Value
Principle Name DDIM Video Export
Workflow Video Editing DDIM Inversion
Step 6 of 6
Type Post-Processing
Repository zai-org/CogVideo
Paper CogVideoX
Last Updated 2026-02-10 00:00 GMT

Overview

Technique for decoding diffusion latents back to pixel-space video and exporting to MP4 files. After DDIM inversion and reconstruction, the latent tensors at the final trajectory step are decoded through the pipeline's VAE decoder, post-processed, and saved as playable video files.

Description

After DDIM inversion and reconstruction, the export process:

  1. Latent extraction: The final step of the trajectory tensor is extracted as the result latents (index [-1] for the last timestep).
  2. VAE decoding: The pipeline's decode_latents() method runs the latents through the 3D VAE decoder, converting from latent space to pixel space.
  3. Post-processing: The pipeline's video_processor.postprocess_video() converts the decoded tensor to PIL Image format.
  4. MP4 export: Frames are assembled into an MP4 video file at the specified FPS.

Two videos are typically produced in the DDIM editing workflow:

  • Inversion reconstruction: Generated from the inversion trajectory endpoint to verify reconstruction quality (should closely match the source video).
  • Edited reconstruction: Generated from the prompted reconstruction trajectory, containing the edited content.

Usage

Use DDIM Video Export as the final step of the video editing pipeline, after both inversion and reconstruction are complete. Comparing the inversion reconstruction with the source video helps validate inversion quality before examining the edit results.

Theoretical Basis

The export process inverts the encoding pipeline:

Decoding chain:

latents -> decode_latents() -> postprocess_video() -> PIL frames -> MP4

pipeline.decode_latents() inverts the VAE encoding by:

  1. Applying inverse scale factor: z = latents / scale_factor
  2. Running through the 3D VAE decoder: x = VAE.decode(z)

video_processor.postprocess_video() handles the tensor-to-image conversion:

  1. Denormalization from [-1, 1] to [0, 255]
  2. Format conversion to PIL Image objects

The MP4 container uses H.264 encoding by default, providing a good balance of compression ratio and playback compatibility.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment