Principle:Zai org CogVideo SAT Prompt Input
| Attribute | Value |
|---|---|
| Principle Name | SAT Prompt Input |
| Workflow | SAT Video Generation |
| Step | 3 of 5 |
| Type | Data Input |
| Repository | zai-org/CogVideo |
| Paper | CogVideoX |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Technique for providing text prompts to the SAT video generation pipeline via interactive CLI or batch file input. The prompt input system supports both single-prompt interactive mode and multi-prompt batch mode with distributed worker sharding.
Description
SAT supports two prompt input modes:
- Interactive CLI (
read_from_cli): Prompts the user for text input interactively. Each prompt is yielded one at a time as a(text, count)tuple. - File-based input (
read_from_file): Reads prompts from a text file for batch generation. Supports distributed generation by sharding prompts across workers based on rank and world_size.
For image-to-video (I2V) generation, prompts use the format "text@@image_path", where the text description and source image path are separated by the @@ delimiter.
Both input modes are implemented as Python generators, yielding (text, count) tuples where count is a sequential index used for output file naming.
Usage
Use SAT Prompt Input after model loading and before the sampling loop. The input mode is selected based on the --input-type configuration parameter. For batch processing across multiple GPUs, file-based input automatically distributes prompts evenly across workers.
Theoretical Basis
Generator-based prompt reading enables streaming of large prompt files without loading all prompts into memory simultaneously. This is particularly important for batch generation scenarios with thousands of prompts.
Worker-based sharding ensures even distribution in multi-GPU inference:
- Worker with rank r and world_size W processes prompts at indices i where
i mod W == r. - This ensures load balancing without inter-worker communication.
- Each worker independently computes its prompt subset, eliminating synchronization overhead.
The "text@@image_path" format for I2V prompts provides a simple, file-compatible way to associate text prompts with source images without requiring a separate metadata format.
Related Pages
- Implementation:Zai_org_CogVideo_SAT_Read_From_CLI_File -- Implementation of CLI and file-based prompt reading
- Zai_org_CogVideo_SAT_Model_Loading_for_Inference -- Previous step: model loading
- Zai_org_CogVideo_Diffusion_Sampling -- Next step: diffusion sampling using the prompt text
- Zai_org_CogVideo_SAT_Inference_Configuration -- Configuration that selects the input mode