Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Microsoft LoRA PyTorch Training Loop

From Leeroopedia


Knowledge Sources
Domains Training, Optimization
Pattern Doc Yes
Last Updated 2026-02-10 05:00 GMT

Overview

Pattern documentation for the standard PyTorch training loop used with LoRA-augmented models.

Description

This is a pattern doc describing the interface users must follow when training a LoRA model. Since LoRA requires no custom training logic, the pattern is simply the standard PyTorch training loop with the optimizer constructed to only receive trainable (LoRA) parameters. The Microsoft LoRA repository includes a reference implementation in the GPT-2 fine-tuning example.

Usage

Follow this pattern after completing model preparation (LoRA layer replacement via loralib layers and parameter freezing via mark_only_lora_as_trainable). Compatible with any PyTorch training framework (raw loops, PyTorch Lightning, HuggingFace Trainer, etc.).

Code Reference

Reference Implementation

  • Repository: microsoft/LoRA
  • File: examples/NLG/src/gpt2_ft.py
  • Lines: 171-258

Pattern Interface

Prerequisites

  1. Model must have LoRA layers replacing target layers (via loralib.Linear, loralib.MergedLinear, etc.)
  2. Non-LoRA parameters must be frozen (via loralib.mark_only_lora_as_trainable)
  3. Optimizer must be constructed with parameter filtering

Core Pattern

import torch
import loralib as lora

# ===== Model Preparation (prerequisites) =====
model = create_model_with_lora_layers()
lora.mark_only_lora_as_trainable(model, bias='none')

# ===== Optimizer Setup =====
# IMPORTANT: Filter to only include trainable parameters
optimizer = torch.optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()),
    lr=2e-4,
    weight_decay=0.01
)

# ===== Standard Training Loop =====
model.train()
for epoch in range(num_epochs):
    for batch in dataloader:
        # Forward pass
        inputs, labels = batch
        outputs = model(inputs)
        loss = loss_fn(outputs, labels)

        # Backward pass
        loss.backward()

        # Optional: gradient clipping
        torch.nn.utils.clip_grad_norm_(
            filter(lambda p: p.requires_grad, model.parameters()),
            max_norm=1.0
        )

        # Optimizer step
        optimizer.step()
        optimizer.zero_grad()

I/O Contract

Inputs

Name Type Required Description
model nn.Module Yes Model with LoRA layers and frozen base parameters
dataloader DataLoader Yes Standard PyTorch DataLoader providing training batches
learning_rate float Yes Learning rate for the optimizer (typically 1e-4 to 5e-4 for LoRA)
num_epochs int Yes Number of training epochs

Outputs

Name Type Description
model nn.Module Model with updated LoRA parameters (base weights unchanged)

Usage Examples

Minimal Training Loop

import torch
import loralib as lora

# Assume model is already prepared with LoRA layers and frozen params
optimizer = torch.optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()),
    lr=2e-4
)

model.train()
for batch in dataloader:
    loss = model(**batch).loss
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

With Learning Rate Scheduler

import torch
from torch.optim.lr_scheduler import CosineAnnealingLR
import loralib as lora

optimizer = torch.optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()),
    lr=5e-4,
    weight_decay=0.01
)
scheduler = CosineAnnealingLR(optimizer, T_max=num_steps)

model.train()
for batch in dataloader:
    loss = model(**batch).loss
    loss.backward()
    torch.nn.utils.clip_grad_norm_(
        filter(lambda p: p.requires_grad, model.parameters()),
        max_norm=1.0
    )
    optimizer.step()
    scheduler.step()
    optimizer.zero_grad()

Reference: GPT-2 Fine-Tuning (from repository)

The file examples/NLG/src/gpt2_ft.py (lines 171-258) demonstrates the full training loop pattern used in the official LoRA repository for fine-tuning GPT-2:

# Simplified from examples/NLG/src/gpt2_ft.py
optimizer = create_adam_optimizer_from_args(model, args)

for epoch in range(args.n_epochs):
    model.train()
    for batch in train_loader:
        # Forward
        output = model(batch)
        loss = output.loss

        # Backward
        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), args.max_grad_norm)

        # Update
        optimizer.step()
        optimizer.zero_grad()

    # Evaluation
    model.eval()
    evaluate(model, eval_loader)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment