Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Zai org CogVideo SAT Diffusion Sample

From Leeroopedia


Attribute Value
Implementation Name SAT Diffusion Sample
Workflow SAT Video Generation
Step 4 of 5
Type API Doc
Source File sat/diffusion_video.py:L250-287, sat/sgm/modules/diffusionmodules/sampling.py:L25-43
Repository zai-org/CogVideo
External Dependencies torch, sat.mpu, sgm.modules.diffusionmodules.sampling
Last Updated 2026-02-10 00:00 GMT

Overview

Implementation of the diffusion sampling method on the SATVideoDiffusionEngine class. The sample method orchestrates the iterative denoising process using the EulerEDM sampler, classifier-free guidance, and optional image conditioning for I2V generation.

Description

The sample method:

  1. Initializes random Gaussian noise of the specified shape
  2. Delegates to the configured sampler (EulerEDM) via BaseDiffusionSampler
  3. The sampler iterates over the timestep schedule, calling the denoiser at each step
  4. Classifier-free guidance is applied by running both conditional and unconditional forward passes
  5. For I2V, concat_images are concatenated to the noise and ofs provides temporal offset embedding
  6. Returns the final denoised latent tensor

The BaseDiffusionSampler at sgm/modules/diffusionmodules/sampling.py defines the abstract sampling interface and timestep scheduling.

Usage

with torch.no_grad():
    samples = model.sample(
        cond=conditioner_output,
        uc=unconditional_output,
        batch_size=1,
        shape=(T, C, H // 8, W // 8),
    )

Code Reference

Source Location

File Lines Description
sat/diffusion_video.py L250-287 SATVideoDiffusionEngine.sample method
sat/sgm/modules/diffusionmodules/sampling.py L25-43 BaseDiffusionSampler abstract class

Signature

class SATVideoDiffusionEngine:
    def sample(
        self,
        cond: Dict,
        uc: Dict = None,
        batch_size: int = 1,
        shape: Tuple = (T, C, H, W),
        concat_images: torch.Tensor = None,  # For I2V
        ofs: torch.Tensor = None,            # For I2V offset
    ) -> torch.Tensor:
        """Returns denoised latent tensor [B, T, C, H, W]"""

Import

from diffusion_video import SATVideoDiffusionEngine

I/O Contract

Inputs

Parameter Type Default Description
cond Dict Required Conditioner output dict containing text embeddings (crossattn, vector, concat keys)
uc Dict None Unconditional conditioner output for classifier-free guidance
batch_size int 1 Number of samples to generate in parallel
shape Tuple[int, ...] Required Latent shape (T, C, H//F, W//F) where F=8
concat_images torch.Tensor None Image latents for I2V mode, concatenated along channel dimension
ofs torch.Tensor None Temporal offset embedding for I2V, typically [2.0]

Outputs

Output Type Description
Return value torch.Tensor Denoised latent tensor of shape [B, T, C, H//F, W//F]

Usage Examples

Example 1: Text-to-video sampling

import torch
from diffusion_video import SATVideoDiffusionEngine

# Assume model is loaded and args are parsed
T = args.sampling_num_frames
H, W = args.sampling_image_size
C = 16  # Latent channels

# Encode text prompt
cond = model.conditioner(text_prompt)
uc = model.conditioner("")  # Empty prompt for CFG

with torch.no_grad():
    samples = model.sample(
        cond=cond,
        uc=uc,
        batch_size=1,
        shape=(T, C, H // 8, W // 8),
    )
# samples shape: [1, T, C, H//8, W//8]

Example 2: Image-to-video sampling

import torch

# Encode source image through VAE
image_latents = model.encode_first_stage(source_image)

# Set I2V offset
ofs = torch.tensor([2.0], device="cuda")

with torch.no_grad():
    samples = model.sample(
        cond=cond,
        uc=uc,
        batch_size=1,
        shape=(T, C, H // 8, W // 8),
        concat_images=image_latents,
        ofs=ofs,
    )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment