Principle:Huggingface Diffusers LoRA Export
| Knowledge Sources | |
|---|---|
| Domains | Diffusion_Models, LoRA, Model_Serialization |
| Last Updated | 2026-02-13 21:00 GMT |
Overview
Saving trained LoRA adapter weights in a diffusers-compatible format involves extracting the adapter state dict from the PEFT-wrapped model, converting it to the diffusers naming convention, and serializing it using safetensors.
Description
After LoRA fine-tuning completes, the trained adapter weights must be saved in a format that can be loaded by the Diffusers pipeline for inference. This involves several steps:
State dict extraction: The LoRA adapter parameters are extracted from the PEFT-wrapped model using get_peft_model_state_dict. This returns only the trainable LoRA parameters (the A and B matrices for each adapted layer), excluding all frozen pretrained weights. This results in a very compact checkpoint (typically 3-50 MB depending on rank and target modules).
Format conversion: PEFT uses its own naming convention for LoRA parameters (e.g., base_model.model.mid_block.attentions.0.transformer_blocks.0.attn1.to_q.lora_A.weight). Diffusers uses a different convention. The convert_state_dict_to_diffusers function translates between these conventions, ensuring the saved weights can be loaded by pipeline.load_lora_weights().
Safetensors serialization: By default, weights are saved in the safetensors format rather than PyTorch's pickle-based format. Safetensors provides security benefits (no arbitrary code execution), faster loading, and memory-mapped access. The saved file is typically named pytorch_lora_weights.safetensors.
Multi-component saving: When both the UNet and text encoder have LoRA adapters, their state dicts are saved together in a single file with prefixed keys (unet. and text_encoder.) to distinguish them during loading.
Usage
Use LoRA export when:
- Training is complete and you want to save the adapter for later inference
- You need to share fine-tuned adapters (they are small and portable)
- Saving intermediate checkpoints during training
- You want to push trained adapters to the Hugging Face Hub
Theoretical Basis
State Dict Structure
A saved LoRA state dict contains only the adapter matrices for each adapted layer:
State dict keys for a rank-4 LoRA on UNet attention:
unet.down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.lora_A.weight [4, 320]
unet.down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.lora_B.weight [320, 4]
unet.down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.lora_A.weight [4, 320]
unet.down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.lora_B.weight [320, 4]
...
Total parameters = num_adapted_layers * 2 * r * d
For SD 1.5 UNet with rank=4, targeting q/k/v/out: ~1.6M parameters (~6 MB in fp32)
PEFT vs. Diffusers Naming
The conversion between naming conventions:
PEFT format:
base_model.model.{module_path}.lora_A.weight
base_model.model.{module_path}.lora_B.weight
Diffusers format:
unet.{module_path}.lora_A.weight
unet.{module_path}.lora_B.weight
Safetensors Format
Safetensors stores tensors with a simple header + data layout:
File structure:
[8 bytes: header_size]
[header: JSON with tensor metadata (name, dtype, shape, offset)]
[data: raw tensor bytes, contiguous]
Benefits:
- No pickle: immune to arbitrary code execution attacks
- Memory-mapped: tensors can be loaded without reading the entire file
- Fast loading: no deserialization overhead
- Lazy loading: individual tensors can be loaded on demand