Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Huggingface Diffusers LoRA Export

From Leeroopedia
Knowledge Sources
Domains Diffusion_Models, LoRA, Model_Serialization
Last Updated 2026-02-13 21:00 GMT

Overview

Saving trained LoRA adapter weights in a diffusers-compatible format involves extracting the adapter state dict from the PEFT-wrapped model, converting it to the diffusers naming convention, and serializing it using safetensors.

Description

After LoRA fine-tuning completes, the trained adapter weights must be saved in a format that can be loaded by the Diffusers pipeline for inference. This involves several steps:

State dict extraction: The LoRA adapter parameters are extracted from the PEFT-wrapped model using get_peft_model_state_dict. This returns only the trainable LoRA parameters (the A and B matrices for each adapted layer), excluding all frozen pretrained weights. This results in a very compact checkpoint (typically 3-50 MB depending on rank and target modules).

Format conversion: PEFT uses its own naming convention for LoRA parameters (e.g., base_model.model.mid_block.attentions.0.transformer_blocks.0.attn1.to_q.lora_A.weight). Diffusers uses a different convention. The convert_state_dict_to_diffusers function translates between these conventions, ensuring the saved weights can be loaded by pipeline.load_lora_weights().

Safetensors serialization: By default, weights are saved in the safetensors format rather than PyTorch's pickle-based format. Safetensors provides security benefits (no arbitrary code execution), faster loading, and memory-mapped access. The saved file is typically named pytorch_lora_weights.safetensors.

Multi-component saving: When both the UNet and text encoder have LoRA adapters, their state dicts are saved together in a single file with prefixed keys (unet. and text_encoder.) to distinguish them during loading.

Usage

Use LoRA export when:

  • Training is complete and you want to save the adapter for later inference
  • You need to share fine-tuned adapters (they are small and portable)
  • Saving intermediate checkpoints during training
  • You want to push trained adapters to the Hugging Face Hub

Theoretical Basis

State Dict Structure

A saved LoRA state dict contains only the adapter matrices for each adapted layer:

State dict keys for a rank-4 LoRA on UNet attention:

unet.down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.lora_A.weight  [4, 320]
unet.down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.lora_B.weight  [320, 4]
unet.down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.lora_A.weight  [4, 320]
unet.down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.lora_B.weight  [320, 4]
...

Total parameters = num_adapted_layers * 2 * r * d
For SD 1.5 UNet with rank=4, targeting q/k/v/out: ~1.6M parameters (~6 MB in fp32)

PEFT vs. Diffusers Naming

The conversion between naming conventions:

PEFT format:
  base_model.model.{module_path}.lora_A.weight
  base_model.model.{module_path}.lora_B.weight

Diffusers format:
  unet.{module_path}.lora_A.weight
  unet.{module_path}.lora_B.weight

Safetensors Format

Safetensors stores tensors with a simple header + data layout:

File structure:
  [8 bytes: header_size]
  [header: JSON with tensor metadata (name, dtype, shape, offset)]
  [data: raw tensor bytes, contiguous]

Benefits:
  - No pickle: immune to arbitrary code execution attacks
  - Memory-mapped: tensors can be loaded without reading the entire file
  - Fast loading: no deserialization overhead
  - Lazy loading: individual tensors can be loaded on demand

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment