Principle:Microsoft LoRA LoRA Checkpoint Saving

Knowledge Sources	Microsoft LoRA LoRA
Domains	Serialization, Parameter_Efficient_Fine_Tuning
Last Updated	2026-02-10 05:00 GMT

Overview

Principle of saving only LoRA parameters as compact checkpoints, separate from the frozen pretrained base model weights.

Description

Since LoRA fine-tuning only modifies a tiny fraction of the total model parameters, it is wasteful to save the entire model state dict as a checkpoint. Instead, LoRA checkpoints contain only the LoRA matrices (lora_A, lora_B) and optionally bias parameters. This produces dramatically smaller checkpoint files that can be stored, transmitted, and versioned independently of the base model.

Usage

Use LoRA checkpoint saving at every checkpoint interval during training and at the end of training. The saved checkpoint is loaded by first loading the base pretrained model, then applying the LoRA state dict with strict=False to overlay the LoRA parameters.

Theoretical Basis

Checkpoint Size Reduction

LoRA checkpoints are a tiny fraction of full model checkpoints because only the low-rank matrices are saved:

Model	Full Checkpoint Size	LoRA Checkpoint Size (r=4)	Reduction
GPT-2 Small (124M)	~500 MB	~1.4 MB	~350x
GPT-2 Medium (355M)	~1.4 GB	~4.7 MB	~300x
GPT-2 Large (774M)	~3.1 GB	~9.4 MB	~330x

These sizes assume LoRA applied to attention Q and V projections with rank r=4.

Multiple Adapters, One Base Model

A key advantage of compact LoRA checkpoints is that multiple task-specific adapters can share a single base model. For example, one pretrained GPT-2 checkpoint (~500 MB) can be combined with dozens of LoRA adapters (~1.4 MB each) for different tasks. This is far more storage-efficient than maintaining separate full fine-tuned models for each task.

Bias Filtering Modes

The checkpoint saving function supports the same three bias modes as mark_only_lora_as_trainable to ensure consistency:

Mode	Saved Parameters
none	Only lora_A and lora_B matrices
all	lora_A, lora_B, and all bias parameters in the model
lora_only	lora_A, lora_B, and biases from LoRA-augmented layers only

Critical: The bias mode used for checkpoint saving must match the mode used during mark_only_lora_as_trainable. A mismatch will cause either missing parameters (if saving mode is stricter) or extra untrained parameters (if saving mode is more permissive).

Checkpoint Loading Pattern

LoRA checkpoints are loaded in two steps:

Load the base pretrained model (full weights)
Apply the LoRA state dict using load_state_dict(lora_dict, strict=False)

The strict=False flag is required because the LoRA state dict only contains a subset of the model's parameters. Without this flag, PyTorch would raise an error about missing keys.

Related Pages

Implemented By

Implementation:Microsoft_LoRA_Lora_State_Dict

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment