Implementation:Zai org CogVideo SAT Diffusion Sample
Appearance
| Attribute | Value |
|---|---|
| Implementation Name | SAT Diffusion Sample |
| Workflow | SAT Video Generation |
| Step | 4 of 5 |
| Type | API Doc |
| Source File | sat/diffusion_video.py:L250-287, sat/sgm/modules/diffusionmodules/sampling.py:L25-43
|
| Repository | zai-org/CogVideo |
| External Dependencies | torch, sat.mpu, sgm.modules.diffusionmodules.sampling |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Implementation of the diffusion sampling method on the SATVideoDiffusionEngine class. The sample method orchestrates the iterative denoising process using the EulerEDM sampler, classifier-free guidance, and optional image conditioning for I2V generation.
Description
The sample method:
- Initializes random Gaussian noise of the specified shape
- Delegates to the configured sampler (EulerEDM) via
BaseDiffusionSampler - The sampler iterates over the timestep schedule, calling the denoiser at each step
- Classifier-free guidance is applied by running both conditional and unconditional forward passes
- For I2V,
concat_imagesare concatenated to the noise andofsprovides temporal offset embedding - Returns the final denoised latent tensor
The BaseDiffusionSampler at sgm/modules/diffusionmodules/sampling.py defines the abstract sampling interface and timestep scheduling.
Usage
with torch.no_grad():
samples = model.sample(
cond=conditioner_output,
uc=unconditional_output,
batch_size=1,
shape=(T, C, H // 8, W // 8),
)
Code Reference
Source Location
| File | Lines | Description |
|---|---|---|
sat/diffusion_video.py |
L250-287 | SATVideoDiffusionEngine.sample method
|
sat/sgm/modules/diffusionmodules/sampling.py |
L25-43 | BaseDiffusionSampler abstract class
|
Signature
class SATVideoDiffusionEngine:
def sample(
self,
cond: Dict,
uc: Dict = None,
batch_size: int = 1,
shape: Tuple = (T, C, H, W),
concat_images: torch.Tensor = None, # For I2V
ofs: torch.Tensor = None, # For I2V offset
) -> torch.Tensor:
"""Returns denoised latent tensor [B, T, C, H, W]"""
Import
from diffusion_video import SATVideoDiffusionEngine
I/O Contract
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
cond |
Dict |
Required | Conditioner output dict containing text embeddings (crossattn, vector, concat keys) |
uc |
Dict |
None |
Unconditional conditioner output for classifier-free guidance |
batch_size |
int |
1 |
Number of samples to generate in parallel |
shape |
Tuple[int, ...] |
Required | Latent shape (T, C, H//F, W//F) where F=8
|
concat_images |
torch.Tensor |
None |
Image latents for I2V mode, concatenated along channel dimension |
ofs |
torch.Tensor |
None |
Temporal offset embedding for I2V, typically [2.0]
|
Outputs
| Output | Type | Description |
|---|---|---|
| Return value | torch.Tensor |
Denoised latent tensor of shape [B, T, C, H//F, W//F]
|
Usage Examples
Example 1: Text-to-video sampling
import torch
from diffusion_video import SATVideoDiffusionEngine
# Assume model is loaded and args are parsed
T = args.sampling_num_frames
H, W = args.sampling_image_size
C = 16 # Latent channels
# Encode text prompt
cond = model.conditioner(text_prompt)
uc = model.conditioner("") # Empty prompt for CFG
with torch.no_grad():
samples = model.sample(
cond=cond,
uc=uc,
batch_size=1,
shape=(T, C, H // 8, W // 8),
)
# samples shape: [1, T, C, H//8, W//8]
Example 2: Image-to-video sampling
import torch
# Encode source image through VAE
image_latents = model.encode_first_stage(source_image)
# Set I2V offset
ofs = torch.tensor([2.0], device="cuda")
with torch.no_grad():
samples = model.sample(
cond=cond,
uc=uc,
batch_size=1,
shape=(T, C, H // 8, W // 8),
concat_images=image_latents,
ofs=ofs,
)
Related Pages
- Principle:Zai_org_CogVideo_Diffusion_Sampling -- Principle governing diffusion sampling with EulerEDM and CFG
- Environment:Zai_org_CogVideo_SAT_Framework_Environment
- Zai_org_CogVideo_SAT_Read_From_CLI_File -- Previous step: prompt input
- Zai_org_CogVideo_SAT_Decode_First_Stage_Export -- Next step: decoding latents and exporting video
- Zai_org_CogVideo_SAT_Get_Model_Load_Checkpoint -- Model loading that prepares the model for sampling
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment