Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Unslothai Unsloth FastLanguageModel Get Peft Model

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Parameter_Efficient_Finetuning, NLP
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for injecting LoRA adapters into language models with fused kernel optimizations provided by the Unsloth library.

Description

FastLanguageModel.get_peft_model wraps HuggingFace PEFT's LoRA injection with Unsloth-specific optimizations. It creates a LoraConfig, applies it via get_peft_model, then patches the model with:

  • Fused LoRA MLP kernels (combining gate/up projections with SwiGLU activation)
  • Unsloth's memory-efficient gradient checkpointing
  • for_inference() and for_training() mode switching methods
  • Optional support for MoE expert layer targeting via target_parameters

Usage

Call immediately after FastLanguageModel.from_pretrained and before configuring the trainer. This is the standard LoRA injection for all text-only SFT and RL workflows. For vision-language models, use the vision-specific variant instead.

Code Reference

Source Location

  • Repository: unsloth
  • File: unsloth/models/llama.py
  • Lines: L2636-3139

Signature

class FastLlamaModel:
    @staticmethod
    def get_peft_model(
        model,
        r = 16,
        target_modules = [
            "q_proj", "k_proj", "v_proj", "o_proj",
            "gate_proj", "up_proj", "down_proj",
        ],
        lora_alpha = 16,
        lora_dropout = 0.0,
        bias = "none",
        layers_to_transform = None,
        layers_pattern = None,
        use_gradient_checkpointing = "unsloth",
        random_state = 3407,
        max_seq_length = 2048,
        use_rslora = False,
        modules_to_save = None,
        init_lora_weights = True,
        loftq_config = {},
        temporary_location = "_unsloth_temporary_saved_buffers",
        qat_scheme = None,
        target_parameters = None,
        ensure_weight_tying = False,
        **kwargs,
    ) -> PeftModel:
        """
        Applies LoRA adapters with Unsloth optimizations.

        Args:
            model: Base model from from_pretrained.
            r: LoRA rank. Default 16.
            target_modules: Linear layers to apply LoRA. Default all attn + MLP.
            lora_alpha: LoRA scaling factor. Default 16.
            lora_dropout: Dropout for LoRA layers. Default 0.0.
            bias: Bias training mode ("none", "all", "lora_only").
            use_gradient_checkpointing: "unsloth" for optimized checkpointing.
            use_rslora: Use rank-stabilized LoRA. Default False.
            modules_to_save: Non-LoRA modules to train (e.g., ["embed_tokens", "lm_head"]).
            target_parameters: Dict for MoE expert layer parameters.
        """

Import

from unsloth import FastLanguageModel
# Called as: FastLanguageModel.get_peft_model(model, ...)

I/O Contract

Inputs

Name Type Required Description
model PreTrainedModel Yes Base model from from_pretrained
r int No LoRA rank (default: 16)
target_modules list[str] No Linear layers for LoRA (default: all attn + MLP projections)
lora_alpha int No LoRA scaling factor (default: 16)
lora_dropout float No Dropout rate (default: 0.0, recommended to keep at 0)
use_gradient_checkpointing str No "unsloth" for optimized checkpointing
use_rslora bool No Rank-stabilized LoRA (default: False)
modules_to_save list[str] No Non-LoRA trainable modules (embed_tokens, lm_head)
target_parameters dict No MoE expert layer targeting

Outputs

Name Type Description
model PeftModel Model with LoRA adapters, fused kernels, gradient checkpointing, and for_inference/for_training methods

Usage Examples

Standard LoRA for SFT

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Apply LoRA to all attention and MLP layers
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                     "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    use_gradient_checkpointing="unsloth",
)

# Check trainable parameters
model.print_trainable_parameters()

High-Rank LoRA for RL

model = FastLanguageModel.get_peft_model(
    model,
    r=64,              # Higher rank for RL
    lora_alpha=64,
    use_rslora=True,   # Rank-stabilized LoRA
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment