Implementation:Microsoft LoRA Mark Only LoRA Trainable
| Knowledge Sources | |
|---|---|
| Domains | Training, Parameter_Efficient_Fine_Tuning |
| Last Updated | 2026-02-10 05:00 GMT |
Overview
Utility function that freezes all model parameters except LoRA matrices and optionally biases.
Description
The mark_only_lora_as_trainable function iterates over all named parameters in a model and sets requires_grad = False for every parameter whose name does not contain the string "lora_". It then optionally re-enables gradients for bias parameters based on the specified bias mode. This is the mechanism that ensures only LoRA parameters receive gradient updates during training.
Usage
Call this function once after model construction (with LoRA layers in place) and before creating the optimizer. The bias parameter should match the bias mode used in lora_state_dict for checkpoint consistency.
Code Reference
Source Location
- Repository: microsoft/LoRA
- File: loralib/utils.py
- Lines: 13-30
Signature
def mark_only_lora_as_trainable(model: nn.Module, bias: str = 'none') -> None:
"""Freeze all parameters except LoRA matrices and optionally biases.
Args:
model: The PyTorch model containing LoRA layers
bias: Bias handling mode - 'none', 'all', or 'lora_only'
Returns:
None (modifies model in-place)
"""
Import
from loralib import mark_only_lora_as_trainable
# or
import loralib as lora
# then use lora.mark_only_lora_as_trainable
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | nn.Module | Yes | PyTorch model containing LoRA-augmented layers |
| bias | str | No (default 'none') | Bias handling mode: 'none', 'all', or 'lora_only' |
Outputs
| Name | Type | Description |
|---|---|---|
| None | None | Modifies model in-place by setting requires_grad on parameters |
Bias Mode Details
| bias Value | Parameters with requires_grad=True |
|---|---|
| none | Only parameters with "lora_" in name |
| all | Parameters with "lora_" in name + all parameters with "bias" in name |
| lora_only | Parameters with "lora_" in name + bias parameters in LoRA-augmented modules only |
Implementation Details
The function operates in two passes:
Pass 1: Iterate over all named parameters. Set requires_grad = False for any parameter whose name does not contain "lora_".
Pass 2 (if bias != 'none'): Re-enable gradients for bias parameters according to the mode:
- all: Set requires_grad = True for any parameter with "bias" in its name
- lora_only: Set requires_grad = True for bias parameters only in modules that also contain lora_ parameters
Usage Examples
Basic Usage (No Bias Training)
import loralib as lora
# After model construction with LoRA layers
lora.mark_only_lora_as_trainable(model)
# Verify: count trainable parameters
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())
print(f"Trainable: {trainable:,} / {total:,} ({100 * trainable / total:.2f}%)")
# Example output: Trainable: 294,912 / 124,439,808 (0.24%)
With All Biases Trainable
import loralib as lora
lora.mark_only_lora_as_trainable(model, bias='all')
With Only LoRA Layer Biases Trainable
import loralib as lora
lora.mark_only_lora_as_trainable(model, bias='lora_only')
Complete Workflow
import torch
import loralib as lora
# 1. Build model with LoRA layers
model = build_model_with_lora(r=8, lora_alpha=16)
# 2. Freeze non-LoRA parameters
lora.mark_only_lora_as_trainable(model, bias='none')
# 3. Create optimizer (only receives LoRA params due to requires_grad filtering)
optimizer = torch.optim.AdamW(
filter(lambda p: p.requires_grad, model.parameters()),
lr=2e-4
)
# 4. Train as usual
for batch in dataloader:
loss = model(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()