Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:OpenGVLab InternVL Vision Encoder LoRA

From Leeroopedia


Knowledge Sources
Domains Parameter_Efficient_Finetuning, Computer_Vision
Last Updated 2026-02-07 00:00 GMT

Overview

Application of Low-Rank Adaptation to the vision encoder (InternViT) component of a vision-language model, enabling parameter-efficient fine-tuning of visual representations.

Description

While LoRA is most commonly applied to the language model, InternVL also supports applying LoRA to the vision encoder (InternViT). This is useful when the vision encoder needs task-specific adaptation (e.g., for medical imaging or remote sensing) but full fine-tuning is too expensive.

Vision encoder LoRA targets the attention and MLP layers of InternViT:

  • Attention: attn.qkv, attn.proj
  • MLP: mlp.fc1, mlp.fc2

This is controlled by the use_backbone_lora argument in ModelArguments. When set to a positive integer (the LoRA rank), adapters are injected into the vision encoder.

Usage

Use vision encoder LoRA when you need to adapt the visual representation for a specific domain while keeping most parameters frozen. Less common than LLM LoRA; typically used alongside LLM LoRA for domain-specific adaptation.

Theoretical Basis

Same LoRA formulation as LLM LoRA (h=Wx+αrBAx), applied to vision transformer layers instead of language model layers.

Target modules for InternViT:

  • attn.qkv: Combined query/key/value projection
  • attn.proj: Output projection
  • mlp.fc1: First MLP layer
  • mlp.fc2: Second MLP layer

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment