Implementation:Unslothai Unsloth FastVisionModel Get Peft Model
| Knowledge Sources | |
|---|---|
| Domains | Vision, NLP, Parameter_Efficient_Finetuning |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for injecting LoRA adapters into vision-language models with selective vision/language layer targeting provided by the Unsloth library.
Description
FastVisionModel.get_peft_model (implemented in FastBaseModel.get_peft_model) extends standard LoRA injection with VLM-specific parameters for controlling which model components receive adapters. It uses get_peft_regex() to automatically detect vision vs. language layers and generates the correct PEFT target module pattern based on finetune_vision_layers, finetune_language_layers, finetune_attention_modules, and finetune_mlp_modules flags.
Usage
Call after FastVisionModel.from_pretrained and before configuring the trainer. For most VLM fine-tuning, enable both vision and language layers. For text-only adaptation of a VLM (e.g., changing output format), disable vision layers.
Code Reference
Source Location
- Repository: unsloth
- File: unsloth/models/vision.py
- Lines: L941-1130
Signature
class FastBaseModel:
@staticmethod
def get_peft_model(
model,
r = 16,
target_modules = None,
lora_alpha = 16,
lora_dropout = 0.0,
bias = "none",
finetune_vision_layers = True,
finetune_language_layers = True,
finetune_attention_modules = True,
finetune_mlp_modules = True,
layers_to_transform = None,
layers_pattern = None,
use_gradient_checkpointing = "unsloth",
random_state = 3407,
max_seq_length = 2048,
use_rslora = False,
modules_to_save = None,
init_lora_weights = True,
loftq_config = {},
task_type = TaskType.CAUSAL_LM,
temporary_location = "_unsloth_temporary_saved_buffers",
qat_scheme = None,
target_parameters = None,
ensure_weight_tying = False,
**kwargs,
) -> PeftModel:
"""
Args:
model: VLM from FastVisionModel.from_pretrained.
finetune_vision_layers: Apply LoRA to vision encoder. Default True.
finetune_language_layers: Apply LoRA to language decoder. Default True.
finetune_attention_modules: Target attention projections. Default True.
finetune_mlp_modules: Target MLP layers. Default True.
target_modules: Auto-detected via get_peft_regex() if None.
"""
Import
from unsloth import FastVisionModel
# Called as: FastVisionModel.get_peft_model(model, ...)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | PreTrainedModel | Yes | VLM from FastVisionModel.from_pretrained |
| r | int | No | LoRA rank (default: 16) |
| finetune_vision_layers | bool | No | Apply LoRA to vision encoder (default: True) |
| finetune_language_layers | bool | No | Apply LoRA to language decoder (default: True) |
| finetune_attention_modules | bool | No | Target attention projections (default: True) |
| finetune_mlp_modules | bool | No | Target MLP layers (default: True) |
| lora_alpha | int | No | LoRA scaling factor (default: 16) |
| use_gradient_checkpointing | str | No | "unsloth" for optimized checkpointing |
Outputs
| Name | Type | Description |
|---|---|---|
| model | PeftModel | VLM with LoRA adapters on selected vision/language layers, gradient checkpointing configured |
Usage Examples
Full VLM LoRA (Vision + Language)
from unsloth import FastVisionModel
model, processor = FastVisionModel.from_pretrained(
model_name="unsloth/Qwen2-VL-7B-Instruct-bnb-4bit",
load_in_4bit=True,
)
model = FastVisionModel.get_peft_model(
model,
r=16,
finetune_vision_layers=True,
finetune_language_layers=True,
finetune_attention_modules=True,
finetune_mlp_modules=True,
use_gradient_checkpointing="unsloth",
)
Language-Only LoRA (Freeze Vision Encoder)
model = FastVisionModel.get_peft_model(
model,
r=16,
finetune_vision_layers=False, # Freeze vision encoder
finetune_language_layers=True, # Only adapt language decoder
)