Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Zai org CogVideo LoRA Configuration Tips

From Leeroopedia



Knowledge Sources
Domains LoRA, Finetuning, Video_Generation
Last Updated 2026-02-10 02:00 GMT

Overview

LoRA configuration for CogVideoX requires rank >= 64, lora_alpha equal to rank or rank//2 (NOT alpha=1), targeting Q/K/V/Out attention modules, with trainable parameters kept in float32.

Description

Low-Rank Adaptation (LoRA) for CogVideoX video diffusion models requires significantly higher rank and different alpha scaling than typical image model LoRA configurations. The default lora_alpha=1 used in many LoRA implementations performs poorly with CogVideoX because the effective scaling factor (alpha/rank) becomes too small. The repository maintainers found that setting alpha equal to rank or rank//2 produces substantially better results.

Usage

Apply these tips when configuring LoRA fine-tuning for any CogVideoX model. These recommendations override the defaults from generic LoRA tutorials which typically recommend rank=4-16 and alpha=1.

The Insight (Rule of Thumb)

  • Rank: Use 64 or higher (default: 128). Video generation requires more expressive adapters than image models due to the temporal dimension.
  • Alpha: Set `lora_alpha` equal to `rank` or `rank // 2` (default: 64 with rank=128). Do NOT use alpha=1.
  • Target modules: `["to_q", "to_k", "to_v", "to_out.0"]` — attention projection layers only.
  • Trainable param dtype: Must be kept in float32 even during mixed-precision training for numerical stability.
  • Trade-off: Higher rank = more trainable parameters = more VRAM usage. Rank 128 with CogVideoX-5B requires ~24GB VRAM (LoRA DDP).

Reasoning

The effective LoRA scaling factor is `alpha / rank`. With the common default of alpha=1 and rank=128, the scaling would be 1/128 ≈ 0.008, making the LoRA contribution negligibly small. The repository's recommendation of alpha=64 with rank=128 gives a scaling of 0.5, which provides meaningful model adaptation.

From `finetune/README.md:145-148`:

"The original repository uses `lora_alpha` set to 1. We found that this value performed poorly in several runs, possibly due to differences in the model backend and training settings. Our recommendation is to set `lora_alpha` to be equal to the rank or `rank // 2`. It's advised to use a rank of 64 or higher."

Default configuration from `finetune/schemas/args.py:75-77`:

rank: int = 128
lora_alpha: int = 64
target_modules: List[str] = ["to_q", "to_k", "to_v", "to_out.0"]

Float32 casting requirement from `finetune/trainer.py:265-266`:

# Make sure the trainable params are in float32
cast_training_params([self.components.transformer], dtype=torch.float32)

SAT LoRA structural invariant from `tools/export_sat_lora_weight.py:50-51`:

if len(lora_state_dict) != 240:
    raise ValueError("lora_state_dict length is not 240")
# 30 layers x 8 matrices (q/k/v/out x A/B) = 240

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment