Principle:Zai org CogVideo DDIM Video Export
| Attribute | Value |
|---|---|
| Principle Name | DDIM Video Export |
| Workflow | Video Editing DDIM Inversion |
| Step | 6 of 6 |
| Type | Post-Processing |
| Repository | zai-org/CogVideo |
| Paper | CogVideoX |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Technique for decoding diffusion latents back to pixel-space video and exporting to MP4 files. After DDIM inversion and reconstruction, the latent tensors at the final trajectory step are decoded through the pipeline's VAE decoder, post-processed, and saved as playable video files.
Description
After DDIM inversion and reconstruction, the export process:
- Latent extraction: The final step of the trajectory tensor is extracted as the result latents (index
[-1]for the last timestep). - VAE decoding: The pipeline's
decode_latents()method runs the latents through the 3D VAE decoder, converting from latent space to pixel space. - Post-processing: The pipeline's
video_processor.postprocess_video()converts the decoded tensor to PIL Image format. - MP4 export: Frames are assembled into an MP4 video file at the specified FPS.
Two videos are typically produced in the DDIM editing workflow:
- Inversion reconstruction: Generated from the inversion trajectory endpoint to verify reconstruction quality (should closely match the source video).
- Edited reconstruction: Generated from the prompted reconstruction trajectory, containing the edited content.
Usage
Use DDIM Video Export as the final step of the video editing pipeline, after both inversion and reconstruction are complete. Comparing the inversion reconstruction with the source video helps validate inversion quality before examining the edit results.
Theoretical Basis
The export process inverts the encoding pipeline:
Decoding chain:
latents -> decode_latents() -> postprocess_video() -> PIL frames -> MP4
pipeline.decode_latents() inverts the VAE encoding by:
- Applying inverse scale factor:
z = latents / scale_factor - Running through the 3D VAE decoder:
x = VAE.decode(z)
video_processor.postprocess_video() handles the tensor-to-image conversion:
- Denormalization from
[-1, 1]to[0, 255] - Format conversion to PIL Image objects
The MP4 container uses H.264 encoding by default, providing a good balance of compression ratio and playback compatibility.
Related Pages
- Implementation:Zai_org_CogVideo_DDIM_Export_Latents_To_Video -- Implementation of latent decoding and video export
- Zai_org_CogVideo_Prompted_Reconstruction -- Previous step: prompted reconstruction with attention injection
- Zai_org_CogVideo_DDIM_Inversion -- Inversion step whose trajectory is also exported for verification
- Zai_org_CogVideo_Video_Encoding -- Encoding step that is inverted by the VAE decoder