Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Norrrrrrr lyn WAInjectBench LoRA Rank Alpha Selection

From Leeroopedia
Knowledge Sources
Domains LLMs, Optimization, Deep_Learning
Last Updated 2026-02-14 16:00 GMT

Overview

Default LoRA hyperparameter selection (r=8, alpha=32, dropout=0.05) for fine-tuning LLaVA vision-language models on binary classification tasks.

Description

When fine-tuning LLaVA-1.5-7b for prompt injection detection, the codebase uses Low-Rank Adaptation (LoRA) with specific default hyperparameters. The rank `r=8` provides a good balance between parameter efficiency and expressiveness. The alpha `alpha=32` (4x the rank) gives a scaling factor that amplifies the LoRA contribution. The dropout `0.05` provides light regularization. Target modules cover all major linear layers in both the language model (q/k/v/o projection, MLP gates) and vision encoder (fc1, fc2, Wqkv, out_proj, proj, dense).

Usage

Use this heuristic when fine-tuning LLaVA or similar vision-language models with LoRA for binary classification tasks. The defaults in the codebase represent the paper's recommended configuration.

The Insight (Rule of Thumb)

  • Action: Set LoRA config with `r=8`, `lora_alpha=32`, `lora_dropout=0.05`, `bias="none"`, `task_type=CAUSAL_LM`.
  • Value: Target a broad set of modules: `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "fc1", "fc2", "Wqkv", "out_proj", "proj", "dense"]`.
  • Trade-off: Higher rank increases trainable parameters but improves capacity. The alpha/rank ratio of 4:1 is a standard practice that amplifies the LoRA update relative to pretrained weights.
  • Compatibility: After LoRA wrapping, all base parameters are frozen and only `lora_*` parameters are set to `requires_grad=True`. The `enable_input_require_grads()` call is needed for gradient flow through the frozen base.

Reasoning

The alpha-to-rank ratio of 4:1 is a widely adopted convention in the LoRA community. With `r=8`, the number of trainable parameters is a tiny fraction of the total (typically < 0.01% for a 7B model), making training feasible on a single GPU. The broad target module list ensures that LoRA adapters are injected into both the language decoder and vision encoder, which is important for multimodal tasks where both modalities need adaptation.

The README states: "Experiments in our paper use the default hyperparameters", confirming these values are the paper's recommended settings.

Code Evidence

LoRA configuration from `train/llava-ft.py:127-148`:

def try_wrap_lora(model, lora_r, lora_alpha, lora_dropout):
    from peft import LoraConfig, get_peft_model, TaskType

    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
        "fc1", "fc2", "Wqkv", "out_proj", "proj", "dense"
    ]

    cfg = LoraConfig(
        r=lora_r,
        lora_alpha=lora_alpha,
        lora_dropout=lora_dropout,
        bias="none",
        task_type=TaskType.CAUSAL_LM,
        target_modules=target_modules,
    )
    model.model = get_peft_model(model.model, cfg)

Default argument values from `train/llava-ft.py:227-229`:

ap.add_argument("--lora_r", type=int, default=8)
ap.add_argument("--lora_alpha", type=int, default=32)
ap.add_argument("--lora_dropout", type=float, default=0.05)

Trainable parameter isolation from `train/llava-ft.py:150-158`:

for name, param in model.model.named_parameters():
    param.requires_grad = False

for name, param in model.model.named_parameters():
    if "lora_" in name.lower():
        param.requires_grad = True

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment