Principle:Zai org CogVideo SAT Prompt Input

Attribute	Value
Principle Name	SAT Prompt Input
Workflow	SAT Video Generation
Step	3 of 5
Type	Data Input
Repository	zai-org/CogVideo
Paper	CogVideoX
Last Updated	2026-02-10 00:00 GMT

Overview

Technique for providing text prompts to the SAT video generation pipeline via interactive CLI or batch file input. The prompt input system supports both single-prompt interactive mode and multi-prompt batch mode with distributed worker sharding.

Description

SAT supports two prompt input modes:

Interactive CLI (read_from_cli): Prompts the user for text input interactively. Each prompt is yielded one at a time as a (text, count) tuple.
File-based input (read_from_file): Reads prompts from a text file for batch generation. Supports distributed generation by sharding prompts across workers based on rank and world_size.

For image-to-video (I2V) generation, prompts use the format "text@@image_path", where the text description and source image path are separated by the @@ delimiter.

Both input modes are implemented as Python generators, yielding (text, count) tuples where count is a sequential index used for output file naming.

Usage

Use SAT Prompt Input after model loading and before the sampling loop. The input mode is selected based on the --input-type configuration parameter. For batch processing across multiple GPUs, file-based input automatically distributes prompts evenly across workers.

Theoretical Basis

Generator-based prompt reading enables streaming of large prompt files without loading all prompts into memory simultaneously. This is particularly important for batch generation scenarios with thousands of prompts.

Worker-based sharding ensures even distribution in multi-GPU inference:

Worker with rank r and world_size W processes prompts at indices i where i mod W == r.
This ensures load balancing without inter-worker communication.
Each worker independently computes its prompt subset, eliminating synchronization overhead.

The "text@@image_path" format for I2V prompts provides a simple, file-compatible way to associate text prompts with source images without requiring a separate metadata format.

Related Pages

Implementation:Zai_org_CogVideo_SAT_Read_From_CLI_File -- Implementation of CLI and file-based prompt reading
Zai_org_CogVideo_SAT_Model_Loading_for_Inference -- Previous step: model loading
Zai_org_CogVideo_Diffusion_Sampling -- Next step: diffusion sampling using the prompt text
Zai_org_CogVideo_SAT_Inference_Configuration -- Configuration that selects the input mode

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment