Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Zai org CogVideo SAT Prompt Input

From Leeroopedia


Attribute Value
Principle Name SAT Prompt Input
Workflow SAT Video Generation
Step 3 of 5
Type Data Input
Repository zai-org/CogVideo
Paper CogVideoX
Last Updated 2026-02-10 00:00 GMT

Overview

Technique for providing text prompts to the SAT video generation pipeline via interactive CLI or batch file input. The prompt input system supports both single-prompt interactive mode and multi-prompt batch mode with distributed worker sharding.

Description

SAT supports two prompt input modes:

  1. Interactive CLI (read_from_cli): Prompts the user for text input interactively. Each prompt is yielded one at a time as a (text, count) tuple.
  2. File-based input (read_from_file): Reads prompts from a text file for batch generation. Supports distributed generation by sharding prompts across workers based on rank and world_size.

For image-to-video (I2V) generation, prompts use the format "text@@image_path", where the text description and source image path are separated by the @@ delimiter.

Both input modes are implemented as Python generators, yielding (text, count) tuples where count is a sequential index used for output file naming.

Usage

Use SAT Prompt Input after model loading and before the sampling loop. The input mode is selected based on the --input-type configuration parameter. For batch processing across multiple GPUs, file-based input automatically distributes prompts evenly across workers.

Theoretical Basis

Generator-based prompt reading enables streaming of large prompt files without loading all prompts into memory simultaneously. This is particularly important for batch generation scenarios with thousands of prompts.

Worker-based sharding ensures even distribution in multi-GPU inference:

  • Worker with rank r and world_size W processes prompts at indices i where i mod W == r.
  • This ensures load balancing without inter-worker communication.
  • Each worker independently computes its prompt subset, eliminating synchronization overhead.

The "text@@image_path" format for I2V prompts provides a simple, file-compatible way to associate text prompts with source images without requiring a separate metadata format.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment