Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Unslothai Unsloth LoRA Adapter Injection

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Parameter_Efficient_Finetuning, NLP
Last Updated 2026-02-07 00:00 GMT

Overview

A parameter-efficient fine-tuning technique that injects trainable low-rank decomposition matrices into frozen pretrained model layers, enabling adaptation with a fraction of the original parameter count.

Description

Low-Rank Adaptation (LoRA) addresses the prohibitive cost of full fine-tuning for large language models. Instead of updating all model parameters, LoRA freezes the pretrained weights and adds small trainable rank decomposition matrices to selected linear layers. For a weight matrix Wd×k, LoRA adds ΔW=BA where Bd×r and Ar×k with rank rmin(d,k).

The key advantages are:

  1. Memory Efficiency: Only LoRA parameters require optimizer states and gradients, reducing training memory by 3-4x.
  2. Training Speed: Fewer trainable parameters means faster gradient computation.
  3. Composability: Multiple LoRA adapters can be trained independently and switched at inference time.
  4. Merge Capability: Trained LoRA weights can be merged back into the base model for deployment without inference overhead.

In the Unsloth context, LoRA injection also involves:

  • Patching forward methods with fused LoRA MLP kernels
  • Configuring Unsloth's optimized gradient checkpointing
  • Attaching for_inference and for_training mode switching methods

Usage

Apply this principle immediately after loading a quantized model and before configuring the trainer. Target modules typically include all attention projections (q, k, v, o) and MLP layers (gate, up, down). The rank r controls the capacity-efficiency tradeoff: higher ranks (32-64) for complex tasks, lower ranks (8-16) for simpler adaptations.

Theoretical Basis

For a pretrained weight matrix W0, the adapted forward pass becomes:

h=W0x+αrBAx

Where:

  • A is initialized from a random Gaussian distribution
  • B is initialized to zero (so ΔW=0 at start)
  • α is a scaling factor (lora_alpha) controlling adaptation magnitude
  • r is the rank (r parameter)
# Abstract LoRA forward pass
def lora_forward(x, W_frozen, A, B, alpha, r):
    base_output = x @ W_frozen.T          # Frozen pretrained computation
    lora_output = x @ A.T @ B.T           # Low-rank adaptation
    return base_output + (alpha / r) * lora_output

The ratio α/r acts as a learning rate modifier for the LoRA parameters. Setting α=r means the LoRA update has the same magnitude as a full-rank update scaled by the learning rate.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment