Implementation:Zai org CogVideo CogVideoX LoRA Trainer Load Components
| Implementation Metadata | |
|---|---|
| Name | CogVideoX_LoRA_Trainer_Load_Components |
| Type | API Doc |
| Category | Model_Architecture |
| Domains | Video_Generation, Fine_Tuning, Diffusion_Models |
| Knowledge Sources | CogVideo Repository, CogVideoX Paper, LoRA Paper |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
CogVideoX_LoRA_Trainer_Load_Components is a concrete tool for loading CogVideoX model components and configuring LoRA adapters, provided by the CogVideo finetune package.
Description
This implementation provides the load_components method on both CogVideoXT2VLoraTrainer and CogVideoXI2VLoraTrainer classes. The method loads all pretrained model sub-components (tokenizer, text encoder, transformer, VAE, scheduler) from a HuggingFace-format checkpoint directory. After loading, the prepare_trainable_parameters method in the base Trainer class applies a LoraConfig to the transformer, attaching low-rank adapters to the specified attention modules.
Usage
Use when initializing a CogVideoX LoRA fine-tuning session. The load_components method is called automatically by the trainer during initialization. The LoRA configuration is applied based on the Args configuration (rank, alpha, target modules).
Code Reference
Source Location
finetune/models/cogvideox_t2v/lora_trainer.py:L26-48-- T2Vload_componentsfinetune/models/cogvideox_i2v/lora_trainer.py:L27-49-- I2Vload_componentsfinetune/trainer.py:L223-253--prepare_trainable_parameterswith LoRA injection
Signature
class CogVideoXT2VLoraTrainer(Trainer):
UNLOAD_LIST = ["text_encoder", "vae"]
@override
def load_components(self) -> Components:
tokenizer = AutoTokenizer.from_pretrained(
self.args.model_path, subfolder="tokenizer"
)
text_encoder = T5EncoderModel.from_pretrained(
self.args.model_path, subfolder="text_encoder"
)
transformer = CogVideoXTransformer3DModel.from_pretrained(
self.args.model_path, subfolder="transformer"
)
vae = AutoencoderKLCogVideoX.from_pretrained(
self.args.model_path, subfolder="vae"
)
scheduler = CogVideoXDPMScheduler.from_pretrained(
self.args.model_path, subfolder="scheduler"
)
return Components(tokenizer, text_encoder, transformer, vae, scheduler)
LoRA configuration applied in trainer.py:
transformer_lora_config = LoraConfig(
r=args.rank,
lora_alpha=args.lora_alpha,
init_lora_weights=True,
target_modules=args.target_modules,
)
transformer.add_adapter(transformer_lora_config)
Import
from finetune.models.cogvideox_t2v.lora_trainer import CogVideoXT2VLoraTrainer
from finetune.models.cogvideox_i2v.lora_trainer import CogVideoXI2VLoraTrainer
Key Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model_path |
Path |
required | Path to pretrained CogVideoX model (HuggingFace format with subdirectories). |
r (rank) |
int |
128 |
Rank of LoRA low-rank matrices. |
lora_alpha |
int |
64 |
LoRA scaling factor (effective scale = alpha/rank). |
target_modules |
List[str] |
["to_q", "to_k", "to_v", "to_out.0"] |
Transformer attention modules to apply LoRA adapters. |
External Dependencies
diffusers--CogVideoXPipeline,AutoencoderKLCogVideoX,CogVideoXTransformer3DModel,CogVideoXDPMSchedulertransformers--T5EncoderModel,AutoTokenizerpeft--LoraConfig
I/O Contract
Inputs
| Input | Format | Description |
|---|---|---|
| Pretrained model | HuggingFace checkpoint directory | Directory containing subdirectories: tokenizer/, text_encoder/, transformer/, vae/, scheduler/.
|
| LoRA configuration | Args fields |
Rank, alpha, and target modules from validated configuration. |
Outputs
| Output | Format | Description |
|---|---|---|
| Components object | Components namedtuple |
Contains tokenizer, text_encoder, transformer, vae, scheduler.
|
| LoRA-adapted transformer | CogVideoXTransformer3DModel with PEFT adapter |
Transformer with LoRA adapter attached; only LoRA parameters require gradients. |
Usage Examples
Initializing the T2V LoRA Trainer
from finetune.schemas import Args
from finetune.models.cogvideox_t2v.lora_trainer import CogVideoXT2VLoraTrainer
# Parse configuration
args = Args.parse_args()
# Initialize trainer (calls load_components internally)
trainer = CogVideoXT2VLoraTrainer(args=args)
# Components are now loaded and LoRA is injected
# trainer.components.transformer has LoRA adapters attached
# trainer.components.text_encoder and trainer.components.vae are on UNLOAD_LIST
Checking Trainable Parameters
# After LoRA injection, verify trainable parameter count
trainable_params = sum(p.numel() for p in trainer.components.transformer.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in trainer.components.transformer.parameters())
print(f"Trainable: {trainable_params:,} / {total_params:,} ({100 * trainable_params / total_params:.2f}%)")
# Typical output: Trainable: ~50M / ~5B (1.0%)
Related Pages
- Principle:Zai_org_CogVideo_Model_Loading_and_LoRA_Injection
- Environment:Zai_org_CogVideo_Diffusers_Finetuning_Environment
- Heuristic:Zai_org_CogVideo_BF16_FP16_Precision_Selection
- Heuristic:Zai_org_CogVideo_LoRA_Configuration_Tips
- Implementation:Zai_org_CogVideo_Args_Parse_Args
- Implementation:Zai_org_CogVideo_Accelerator_Setup