Principle:Zai org CogVideo DDIM Video Export

Attribute	Value
Principle Name	DDIM Video Export
Workflow	Video Editing DDIM Inversion
Step	6 of 6
Type	Post-Processing
Repository	zai-org/CogVideo
Paper	CogVideoX
Last Updated	2026-02-10 00:00 GMT

Overview

Technique for decoding diffusion latents back to pixel-space video and exporting to MP4 files. After DDIM inversion and reconstruction, the latent tensors at the final trajectory step are decoded through the pipeline's VAE decoder, post-processed, and saved as playable video files.

Description

After DDIM inversion and reconstruction, the export process:

Latent extraction: The final step of the trajectory tensor is extracted as the result latents (index [-1] for the last timestep).
VAE decoding: The pipeline's decode_latents() method runs the latents through the 3D VAE decoder, converting from latent space to pixel space.
Post-processing: The pipeline's video_processor.postprocess_video() converts the decoded tensor to PIL Image format.
MP4 export: Frames are assembled into an MP4 video file at the specified FPS.

Two videos are typically produced in the DDIM editing workflow:

Inversion reconstruction: Generated from the inversion trajectory endpoint to verify reconstruction quality (should closely match the source video).
Edited reconstruction: Generated from the prompted reconstruction trajectory, containing the edited content.

Usage

Use DDIM Video Export as the final step of the video editing pipeline, after both inversion and reconstruction are complete. Comparing the inversion reconstruction with the source video helps validate inversion quality before examining the edit results.

Theoretical Basis

The export process inverts the encoding pipeline:

Decoding chain:

latents -> decode_latents() -> postprocess_video() -> PIL frames -> MP4

pipeline.decode_latents() inverts the VAE encoding by:

Applying inverse scale factor: z = latents / scale_factor
Running through the 3D VAE decoder: x = VAE.decode(z)

video_processor.postprocess_video() handles the tensor-to-image conversion:

Denormalization from [-1, 1] to [0, 255]
Format conversion to PIL Image objects

The MP4 container uses H.264 encoding by default, providing a good balance of compression ratio and playback compatibility.

Related Pages

Implementation:Zai_org_CogVideo_DDIM_Export_Latents_To_Video -- Implementation of latent decoding and video export
Zai_org_CogVideo_Prompted_Reconstruction -- Previous step: prompted reconstruction with attention injection
Zai_org_CogVideo_DDIM_Inversion -- Inversion step whose trajectory is also exported for verification
Zai_org_CogVideo_Video_Encoding -- Encoding step that is inverted by the VAE decoder

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment