Implementation:Zai org CogVideo CogVideoX LoRA Trainer Load Components

Implementation Metadata
Name	CogVideoX_LoRA_Trainer_Load_Components
Type	API Doc
Category	Model_Architecture
Domains	Video_Generation, Fine_Tuning, Diffusion_Models
Knowledge Sources	CogVideo Repository, CogVideoX Paper, LoRA Paper
Last Updated	2026-02-10 00:00 GMT

Overview

CogVideoX_LoRA_Trainer_Load_Components is a concrete tool for loading CogVideoX model components and configuring LoRA adapters, provided by the CogVideo finetune package.

Description

This implementation provides the load_components method on both CogVideoXT2VLoraTrainer and CogVideoXI2VLoraTrainer classes. The method loads all pretrained model sub-components (tokenizer, text encoder, transformer, VAE, scheduler) from a HuggingFace-format checkpoint directory. After loading, the prepare_trainable_parameters method in the base Trainer class applies a LoraConfig to the transformer, attaching low-rank adapters to the specified attention modules.

Usage

Use when initializing a CogVideoX LoRA fine-tuning session. The load_components method is called automatically by the trainer during initialization. The LoRA configuration is applied based on the Args configuration (rank, alpha, target modules).

Code Reference

Source Location

finetune/models/cogvideox_t2v/lora_trainer.py:L26-48 -- T2V load_components
finetune/models/cogvideox_i2v/lora_trainer.py:L27-49 -- I2V load_components
finetune/trainer.py:L223-253 -- prepare_trainable_parameters with LoRA injection

Signature

class CogVideoXT2VLoraTrainer(Trainer):
    UNLOAD_LIST = ["text_encoder", "vae"]

    @override
    def load_components(self) -> Components:
        tokenizer = AutoTokenizer.from_pretrained(
            self.args.model_path, subfolder="tokenizer"
        )
        text_encoder = T5EncoderModel.from_pretrained(
            self.args.model_path, subfolder="text_encoder"
        )
        transformer = CogVideoXTransformer3DModel.from_pretrained(
            self.args.model_path, subfolder="transformer"
        )
        vae = AutoencoderKLCogVideoX.from_pretrained(
            self.args.model_path, subfolder="vae"
        )
        scheduler = CogVideoXDPMScheduler.from_pretrained(
            self.args.model_path, subfolder="scheduler"
        )
        return Components(tokenizer, text_encoder, transformer, vae, scheduler)

LoRA configuration applied in trainer.py:

transformer_lora_config = LoraConfig(
    r=args.rank,
    lora_alpha=args.lora_alpha,
    init_lora_weights=True,
    target_modules=args.target_modules,
)
transformer.add_adapter(transformer_lora_config)

Import

from finetune.models.cogvideox_t2v.lora_trainer import CogVideoXT2VLoraTrainer
from finetune.models.cogvideox_i2v.lora_trainer import CogVideoXI2VLoraTrainer

Key Parameters

Parameter	Type	Default	Description
`model_path`	`Path`	required	Path to pretrained CogVideoX model (HuggingFace format with subdirectories).
`r` (rank)	`int`	`128`	Rank of LoRA low-rank matrices.
`lora_alpha`	`int`	`64`	LoRA scaling factor (effective scale = alpha/rank).
`target_modules`	`List[str]`	`["to_q", "to_k", "to_v", "to_out.0"]`	Transformer attention modules to apply LoRA adapters.

External Dependencies

diffusers -- CogVideoXPipeline, AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXDPMScheduler
transformers -- T5EncoderModel, AutoTokenizer
peft -- LoraConfig

I/O Contract

Inputs

Input	Format	Description
Pretrained model	HuggingFace checkpoint directory	Directory containing subdirectories: `tokenizer/`, `text_encoder/`, `transformer/`, `vae/`, `scheduler/`.
LoRA configuration	`Args` fields	Rank, alpha, and target modules from validated configuration.

Outputs

Output	Format	Description
Components object	`Components` namedtuple	Contains `tokenizer`, `text_encoder`, `transformer`, `vae`, `scheduler`.
LoRA-adapted transformer	`CogVideoXTransformer3DModel` with PEFT adapter	Transformer with LoRA adapter attached; only LoRA parameters require gradients.

Usage Examples

Initializing the T2V LoRA Trainer

from finetune.schemas import Args
from finetune.models.cogvideox_t2v.lora_trainer import CogVideoXT2VLoraTrainer

# Parse configuration
args = Args.parse_args()

# Initialize trainer (calls load_components internally)
trainer = CogVideoXT2VLoraTrainer(args=args)

# Components are now loaded and LoRA is injected
# trainer.components.transformer has LoRA adapters attached
# trainer.components.text_encoder and trainer.components.vae are on UNLOAD_LIST

Checking Trainable Parameters

# After LoRA injection, verify trainable parameter count
trainable_params = sum(p.numel() for p in trainer.components.transformer.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in trainer.components.transformer.parameters())
print(f"Trainable: {trainable_params:,} / {total_params:,} ({100 * trainable_params / total_params:.2f}%)")
# Typical output: Trainable: ~50M / ~5B (1.0%)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment