Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Zai org CogVideo CogVideoX LoRA Trainer Load Components

From Leeroopedia


Implementation Metadata
Name CogVideoX_LoRA_Trainer_Load_Components
Type API Doc
Category Model_Architecture
Domains Video_Generation, Fine_Tuning, Diffusion_Models
Knowledge Sources CogVideo Repository, CogVideoX Paper, LoRA Paper
Last Updated 2026-02-10 00:00 GMT

Overview

CogVideoX_LoRA_Trainer_Load_Components is a concrete tool for loading CogVideoX model components and configuring LoRA adapters, provided by the CogVideo finetune package.

Description

This implementation provides the load_components method on both CogVideoXT2VLoraTrainer and CogVideoXI2VLoraTrainer classes. The method loads all pretrained model sub-components (tokenizer, text encoder, transformer, VAE, scheduler) from a HuggingFace-format checkpoint directory. After loading, the prepare_trainable_parameters method in the base Trainer class applies a LoraConfig to the transformer, attaching low-rank adapters to the specified attention modules.

Usage

Use when initializing a CogVideoX LoRA fine-tuning session. The load_components method is called automatically by the trainer during initialization. The LoRA configuration is applied based on the Args configuration (rank, alpha, target modules).

Code Reference

Source Location

  • finetune/models/cogvideox_t2v/lora_trainer.py:L26-48 -- T2V load_components
  • finetune/models/cogvideox_i2v/lora_trainer.py:L27-49 -- I2V load_components
  • finetune/trainer.py:L223-253 -- prepare_trainable_parameters with LoRA injection

Signature

class CogVideoXT2VLoraTrainer(Trainer):
    UNLOAD_LIST = ["text_encoder", "vae"]

    @override
    def load_components(self) -> Components:
        tokenizer = AutoTokenizer.from_pretrained(
            self.args.model_path, subfolder="tokenizer"
        )
        text_encoder = T5EncoderModel.from_pretrained(
            self.args.model_path, subfolder="text_encoder"
        )
        transformer = CogVideoXTransformer3DModel.from_pretrained(
            self.args.model_path, subfolder="transformer"
        )
        vae = AutoencoderKLCogVideoX.from_pretrained(
            self.args.model_path, subfolder="vae"
        )
        scheduler = CogVideoXDPMScheduler.from_pretrained(
            self.args.model_path, subfolder="scheduler"
        )
        return Components(tokenizer, text_encoder, transformer, vae, scheduler)

LoRA configuration applied in trainer.py:

transformer_lora_config = LoraConfig(
    r=args.rank,
    lora_alpha=args.lora_alpha,
    init_lora_weights=True,
    target_modules=args.target_modules,
)
transformer.add_adapter(transformer_lora_config)

Import

from finetune.models.cogvideox_t2v.lora_trainer import CogVideoXT2VLoraTrainer
from finetune.models.cogvideox_i2v.lora_trainer import CogVideoXI2VLoraTrainer

Key Parameters

Parameter Type Default Description
model_path Path required Path to pretrained CogVideoX model (HuggingFace format with subdirectories).
r (rank) int 128 Rank of LoRA low-rank matrices.
lora_alpha int 64 LoRA scaling factor (effective scale = alpha/rank).
target_modules List[str] ["to_q", "to_k", "to_v", "to_out.0"] Transformer attention modules to apply LoRA adapters.

External Dependencies

  • diffusers -- CogVideoXPipeline, AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXDPMScheduler
  • transformers -- T5EncoderModel, AutoTokenizer
  • peft -- LoraConfig

I/O Contract

Inputs

Input Format Description
Pretrained model HuggingFace checkpoint directory Directory containing subdirectories: tokenizer/, text_encoder/, transformer/, vae/, scheduler/.
LoRA configuration Args fields Rank, alpha, and target modules from validated configuration.

Outputs

Output Format Description
Components object Components namedtuple Contains tokenizer, text_encoder, transformer, vae, scheduler.
LoRA-adapted transformer CogVideoXTransformer3DModel with PEFT adapter Transformer with LoRA adapter attached; only LoRA parameters require gradients.

Usage Examples

Initializing the T2V LoRA Trainer

from finetune.schemas import Args
from finetune.models.cogvideox_t2v.lora_trainer import CogVideoXT2VLoraTrainer

# Parse configuration
args = Args.parse_args()

# Initialize trainer (calls load_components internally)
trainer = CogVideoXT2VLoraTrainer(args=args)

# Components are now loaded and LoRA is injected
# trainer.components.transformer has LoRA adapters attached
# trainer.components.text_encoder and trainer.components.vae are on UNLOAD_LIST

Checking Trainable Parameters

# After LoRA injection, verify trainable parameter count
trainable_params = sum(p.numel() for p in trainer.components.transformer.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in trainer.components.transformer.parameters())
print(f"Trainable: {trainable_params:,} / {total_params:,} ({100 * trainable_params / total_params:.2f}%)")
# Typical output: Trainable: ~50M / ~5B (1.0%)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment