Implementation:Microsoft LoRA Mark Only LoRA Trainable

Knowledge Sources	Microsoft LoRA
Domains	Training, Parameter_Efficient_Fine_Tuning
Last Updated	2026-02-10 05:00 GMT

Overview

Utility function that freezes all model parameters except LoRA matrices and optionally biases.

Description

The mark_only_lora_as_trainable function iterates over all named parameters in a model and sets requires_grad = False for every parameter whose name does not contain the string "lora_". It then optionally re-enables gradients for bias parameters based on the specified bias mode. This is the mechanism that ensures only LoRA parameters receive gradient updates during training.

Usage

Call this function once after model construction (with LoRA layers in place) and before creating the optimizer. The bias parameter should match the bias mode used in lora_state_dict for checkpoint consistency.

Code Reference

Source Location

Repository: microsoft/LoRA
File: loralib/utils.py
Lines: 13-30

Signature

def mark_only_lora_as_trainable(model: nn.Module, bias: str = 'none') -> None:
    """Freeze all parameters except LoRA matrices and optionally biases.

    Args:
        model: The PyTorch model containing LoRA layers
        bias: Bias handling mode - 'none', 'all', or 'lora_only'

    Returns:
        None (modifies model in-place)
    """

Import

from loralib import mark_only_lora_as_trainable
# or
import loralib as lora
# then use lora.mark_only_lora_as_trainable

I/O Contract

Inputs

Name	Type	Required	Description
model	nn.Module	Yes	PyTorch model containing LoRA-augmented layers
bias	str	No (default 'none')	Bias handling mode: 'none', 'all', or 'lora_only'

Outputs

Name	Type	Description
None	None	Modifies model in-place by setting requires_grad on parameters

Bias Mode Details

bias Value	Parameters with requires_grad=True
none	Only parameters with "lora_" in name
all	Parameters with "lora_" in name + all parameters with "bias" in name
lora_only	Parameters with "lora_" in name + bias parameters in LoRA-augmented modules only

Implementation Details

The function operates in two passes:

Pass 1: Iterate over all named parameters. Set requires_grad = False for any parameter whose name does not contain "lora_".

Pass 2 (if bias != 'none'): Re-enable gradients for bias parameters according to the mode:

all: Set requires_grad = True for any parameter with "bias" in its name
lora_only: Set requires_grad = True for bias parameters only in modules that also contain lora_ parameters

Usage Examples

Basic Usage (No Bias Training)

import loralib as lora

# After model construction with LoRA layers
lora.mark_only_lora_as_trainable(model)

# Verify: count trainable parameters
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())
print(f"Trainable: {trainable:,} / {total:,} ({100 * trainable / total:.2f}%)")
# Example output: Trainable: 294,912 / 124,439,808 (0.24%)

With All Biases Trainable

import loralib as lora

lora.mark_only_lora_as_trainable(model, bias='all')

With Only LoRA Layer Biases Trainable

import loralib as lora

lora.mark_only_lora_as_trainable(model, bias='lora_only')

Complete Workflow

import torch
import loralib as lora

# 1. Build model with LoRA layers
model = build_model_with_lora(r=8, lora_alpha=16)

# 2. Freeze non-LoRA parameters
lora.mark_only_lora_as_trainable(model, bias='none')

# 3. Create optimizer (only receives LoRA params due to requires_grad filtering)
optimizer = torch.optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()),
    lr=2e-4
)

# 4. Train as usual
for batch in dataloader:
    loss = model(batch)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

Related Pages

Implements Principle

Principle:Microsoft_LoRA_Parameter_Freezing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment