Heuristic:Zai org CogVideo LoRA Configuration Tips
| Knowledge Sources | |
|---|---|
| Domains | LoRA, Finetuning, Video_Generation |
| Last Updated | 2026-02-10 02:00 GMT |
Overview
LoRA configuration for CogVideoX requires rank >= 64, lora_alpha equal to rank or rank//2 (NOT alpha=1), targeting Q/K/V/Out attention modules, with trainable parameters kept in float32.
Description
Low-Rank Adaptation (LoRA) for CogVideoX video diffusion models requires significantly higher rank and different alpha scaling than typical image model LoRA configurations. The default lora_alpha=1 used in many LoRA implementations performs poorly with CogVideoX because the effective scaling factor (alpha/rank) becomes too small. The repository maintainers found that setting alpha equal to rank or rank//2 produces substantially better results.
Usage
Apply these tips when configuring LoRA fine-tuning for any CogVideoX model. These recommendations override the defaults from generic LoRA tutorials which typically recommend rank=4-16 and alpha=1.
The Insight (Rule of Thumb)
- Rank: Use 64 or higher (default: 128). Video generation requires more expressive adapters than image models due to the temporal dimension.
- Alpha: Set `lora_alpha` equal to `rank` or `rank // 2` (default: 64 with rank=128). Do NOT use alpha=1.
- Target modules: `["to_q", "to_k", "to_v", "to_out.0"]` — attention projection layers only.
- Trainable param dtype: Must be kept in float32 even during mixed-precision training for numerical stability.
- Trade-off: Higher rank = more trainable parameters = more VRAM usage. Rank 128 with CogVideoX-5B requires ~24GB VRAM (LoRA DDP).
Reasoning
The effective LoRA scaling factor is `alpha / rank`. With the common default of alpha=1 and rank=128, the scaling would be 1/128 ≈ 0.008, making the LoRA contribution negligibly small. The repository's recommendation of alpha=64 with rank=128 gives a scaling of 0.5, which provides meaningful model adaptation.
From `finetune/README.md:145-148`:
"The original repository uses `lora_alpha` set to 1. We found that this value performed poorly in several runs, possibly due to differences in the model backend and training settings. Our recommendation is to set `lora_alpha` to be equal to the rank or `rank // 2`. It's advised to use a rank of 64 or higher."
Default configuration from `finetune/schemas/args.py:75-77`:
rank: int = 128
lora_alpha: int = 64
target_modules: List[str] = ["to_q", "to_k", "to_v", "to_out.0"]
Float32 casting requirement from `finetune/trainer.py:265-266`:
# Make sure the trainable params are in float32
cast_training_params([self.components.transformer], dtype=torch.float32)
SAT LoRA structural invariant from `tools/export_sat_lora_weight.py:50-51`:
if len(lora_state_dict) != 240:
raise ValueError("lora_state_dict length is not 240")
# 30 layers x 8 matrices (q/k/v/out x A/B) = 240