Environment:Zai org CogVideo SAT Framework Environment
| Knowledge Sources | |
|---|---|
| Domains | Video_Generation, Deep_Learning, Finetuning, Inference |
| Last Updated | 2026-02-10 02:00 GMT |
Overview
Linux multi-GPU environment with Python 3.10+, PyTorch (pre-installed), CUDA, SwissArmyTransformer >= 0.4.12, DeepSpeed >= 0.15.3, and PyTorch Lightning >= 2.4.0 for SAT-based CogVideoX finetuning and video generation.
Description
This environment provides the SAT (SwissArmyTransformer) framework stack for fine-tuning and inference with CogVideoX models. Unlike the Diffusers-based pipeline, the SAT pipeline uses its own model format and training infrastructure built on PyTorch Lightning and DeepSpeed. The default configuration targets 8x A100 GPUs for distributed training. The SAT framework requires PyTorch to be pre-installed separately (not included in its requirements.txt).
Usage
Use this environment for SAT-based finetuning (LoRA or SFT) and SAT-based video generation workflows. This is the prerequisite for running the SAT training scripts (`sat/finetune_single_gpu.sh`, `sat/finetune_multi_gpus.sh`) and inference scripts (`sat/inference.sh`).
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu recommended) | CUDA-compatible OS required |
| Hardware (single GPU) | NVIDIA GPU with sufficient VRAM | Minimum depends on model size |
| Hardware (multi-GPU default) | 8x NVIDIA A100 GPUs | Explicitly targeted in sft.yaml config |
| Python | >= 3.10 | |
| CUDA | >= 11.0 | bf16 required for 5B models |
| PyTorch | Pre-installed separately | Not included in sat/requirements.txt |
Dependencies
System Packages
- NVIDIA CUDA Toolkit (compatible with PyTorch build)
Python Packages (Pre-installed)
- `torch` (version compatible with CUDA setup)
- `torchvision` (matching torch version)
Python Packages (from sat/requirements.txt)
- `SwissArmyTransformer` >= 0.4.12
- `omegaconf` >= 2.3.0
- `pytorch_lightning` >= 2.4.0
- `kornia` >= 0.7.3
- `beartype` >= 0.19.0
- `fsspec` >= 2024.2.0
- `safetensors` >= 0.4.5
- `scipy` >= 1.14.1
- `decord` >= 0.6.0
- `wandb` >= 0.18.5
- `deepspeed` >= 0.15.3
Credentials
No API tokens are required. Models are loaded from local checkpoints.
Quick Install
# Install PyTorch first (must match CUDA version)
pip install torch torchvision
# Install SAT dependencies
pip install SwissArmyTransformer>=0.4.12 omegaconf>=2.3.0 pytorch_lightning>=2.4.0 \
kornia>=0.7.3 beartype>=0.19.0 fsspec>=2024.2.0 safetensors>=0.4.5 \
scipy>=1.14.1 decord>=0.6.0 wandb>=0.18.5 deepspeed>=0.15.3
# Or from requirements file
pip install -r sat/requirements.txt
Code Evidence
Multi-GPU CUDA memory config from `sat/finetune_multi_gpus.sh:3-5`:
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
torchrun --standalone --nnodes=1 --nproc_per_node=8 train_video.py
Single-GPU environment variables from `sat/finetune_single_gpu.sh:4-9`:
export WORLD_SIZE=1
export RANK=0
export LOCAL_RANK=0
export LOCAL_WORLD_SIZE=1
DeepSpeed config comment from `sat/configs/sft.yaml:31`:
# This setting is for 8 x A100 GPUs
deepspeed:
train_micro_batch_size_per_gpu: 2
Precision configuration from `sat/configs/sft.yaml:47-50`:
bf16:
enabled: True # For CogVideoX-2B Turn to False and For CogVideoX-5B Turn to True
fp16:
enabled: False # For CogVideoX-2B Turn to True and For CogVideoX-5B Turn to False
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| CUDA OOM during training | Insufficient VRAM per GPU | Use more GPUs, reduce batch size, or enable ZeRO-3 offloading |
| NCCL timeout | Slow initialization or data loading | Set `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` |
| CUDA memory fragmentation | Long training runs on large models | Set `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` as in multi-GPU script |
| `lora_state_dict length is not 240` | Corrupted or incompatible LoRA checkpoint | CogVideoX transformer requires exactly 240 LoRA parameters (30 layers x 8 matrices) |
| Wrong precision for model | fp16/bf16 mismatch | CogVideoX-2B uses fp16; CogVideoX-5B uses bf16. Check sft.yaml settings. |
Compatibility Notes
- CogVideoX-2B: Set `fp16.enabled: True` and `bf16.enabled: False` in sft.yaml.
- CogVideoX-5B: Set `bf16.enabled: True` and `fp16.enabled: False` in sft.yaml.
- Single GPU: Use `finetune_single_gpu.sh` which manually sets WORLD_SIZE, RANK, etc.
- Multi-GPU: Use `finetune_multi_gpus.sh` with torchrun. Default is 8 GPUs.
- PYTORCH_CUDA_ALLOC_CONF: Set `expandable_segments:True` for multi-GPU to prevent CUDA memory fragmentation.
- PyTorch: Must be pre-installed before SAT dependencies; not included in sat/requirements.txt.
- Gradient checkpointing: Enabled via `checkpoint_activations: True` in model YAML configs.
Related Pages
- Implementation:Zai_org_CogVideo_SAT_Requirements_Install
- Implementation:Zai_org_CogVideo_SAT_VideoDataset
- Implementation:Zai_org_CogVideo_SAT_Get_Args
- Implementation:Zai_org_CogVideo_SATVideoDiffusionEngine_Init
- Implementation:Zai_org_CogVideo_SAT_Training_Main
- Implementation:Zai_org_CogVideo_SAT_Convert_Weight
- Implementation:Zai_org_CogVideo_SAT_Inference_Get_Args
- Implementation:Zai_org_CogVideo_SAT_Get_Model_Load_Checkpoint
- Implementation:Zai_org_CogVideo_SAT_Read_From_CLI_File
- Implementation:Zai_org_CogVideo_SAT_Diffusion_Sample
- Implementation:Zai_org_CogVideo_SAT_Decode_First_Stage_Export