Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Peft Update And Allocate

From Leeroopedia


Metadata

Field Value
Source PEFT | https://github.com/huggingface/peft
Domains Deep_Learning, Parameter_Efficient_Finetuning
Last Updated 2026-02-07 00:00 GMT

Overview

update_and_allocate is the method on AdaLoraModel that orchestrates the runtime rank allocation mechanism for AdaLoRA. It is the primary interface between the user's training loop and the internal RankAllocator class, handling the three-phase training schedule (warmup, rank reduction, final fine-tuning) and delegating importance score updates and budget masking to the allocator. This method must be called at every training step after loss.backward() and before optimizer.zero_grad().

Source

Files:

  • src/peft/tuners/adalora/model.py, lines 294-334 -- AdaLoraModel.update_and_allocate method
  • src/peft/tuners/adalora/layer.py, lines 196-362 -- RankAllocator class

Repository: huggingface/peft

Signature

def update_and_allocate(self, global_step: int) -> None:
    """
    This method updates Adalora budget and mask.

    This should be called in every training step after loss.backward() and before zero_grad().

    tinit, tfinal and deltaT are handled within the method.

    Args:
        global_step (int): The current training step, used to calculate adalora budget.
    """

Import

This is a method on the AdaLoraModel class, accessed via the base_model attribute of a PEFT-wrapped model. It is not imported directly.

# Access via model.base_model
model.base_model.update_and_allocate(global_step)

Parameters

Parameter Type Default Description
global_step int required The current training step (0-indexed). Used to determine which phase of the three-phase schedule the training is in and to compute the current budget via the cubic decay schedule.

Return Value

Returns None. The method operates by mutating the model's parameters in-place (masking singular values to zero) and updating the internal state of the RankAllocator.

Behavior

The method implements three distinct code paths based on the current global_step relative to the schedule parameters total_step and tfinal:

Phase 1 and 2: Before Final Fine-tuning

When global_step < total_step - tfinal:

if global_step < lora_config.total_step - lora_config.tfinal:
    _, rank_pattern = self.rankallocator.update_and_allocate(self.model, global_step)
    if rank_pattern:
        lora_config.rank_pattern = rank_pattern

This delegates to RankAllocator.update_and_allocate, which:

  1. Calls update_ipt(model) to update importance scores using the current gradients
  2. Calls budget_schedule(global_step) to compute the current budget and whether masking should be applied
  3. If masking is triggered (every deltaT steps during the reduction phase), calls mask_to_budget(model, budget) to zero out the least important singular values
  4. Returns the budget and the resulting rank pattern

During the warmup phase (step <= tinit), importance scores are still updated but mask_ind is False, so no masking is applied.

Transition to Final Phase

When global_step == total_step - tfinal:

elif global_step == lora_config.total_step - lora_config.tfinal:
    _, rank_pattern = self.rankallocator.update_and_allocate(self.model, global_step, force_mask=True)
    lora_config.rank_pattern = rank_pattern
    self.rankallocator.reset_ipt()

This is the transition step where:

  1. A final masking pass is performed with force_mask=True, ensuring the budget is enforced regardless of the deltaT interval
  2. The resulting rank pattern is saved to the config (for later serialization)
  3. The importance tracking state is reset via reset_ipt() to free memory

Final Fine-tuning Phase

When global_step > total_step - tfinal:

elif global_step > lora_config.total_step - lora_config.tfinal:
    self.rankallocator.mask_using_rank_pattern(self.model, lora_config.rank_pattern)

During the final phase, no importance scores are computed. Instead, the previously-saved rank_pattern is applied directly to mask the appropriate singular values. This is more efficient than recomputing importance but achieves the same result: maintaining the frozen rank allocation.

The RankAllocator Class

The RankAllocator class (defined in layer.py) manages the internal state for importance scoring and budget scheduling:

Initialization:

class RankAllocator:
    def __init__(self, model, peft_config, adapter_name):
        self.beta1 = peft_config.beta1
        self.beta2 = peft_config.beta2
        self.reset_ipt()
        self._set_budget_scheduler(model)

Internal state:

  • ipt: Dictionary of instantaneous importance scores per parameter
  • exp_avg_ipt: Dictionary of EMA-smoothed importance (sensitivity)
  • exp_avg_unc: Dictionary of EMA-smoothed uncertainty
  • init_bgt: Total initial budget (init_r * n_layers)
  • target_bgt: Total target budget (target_r * n_layers)
  • name_set: Sorted set of adapted layer names

Importance update (update_ipt):

def update_ipt(self, model):
    for n, p in model.named_parameters():
        if "lora_" in n and self.adapter_name in n:
            with torch.no_grad():
                self.ipt[n] = (p * p.grad).abs().detach()
                # Sensitivity smoothing
                self.exp_avg_ipt[n] = self.beta1 * self.exp_avg_ipt[n] + (1 - self.beta1) * self.ipt[n]
                # Uncertainty quantification
                self.exp_avg_unc[n] = (
                    self.beta2 * self.exp_avg_unc[n]
                    + (1 - self.beta2) * (self.ipt[n] - self.exp_avg_ipt[n]).abs()
                )

Budget schedule (budget_schedule):

def budget_schedule(self, step: int):
    tinit = self.peft_config.tinit
    tfinal = self.peft_config.tfinal
    total_step = self.peft_config.total_step
    if step <= tinit:
        budget = self.init_bgt
        mask_ind = False
    elif step > total_step - tfinal:
        budget = self.target_bgt
        mask_ind = True
    else:
        mul_coeff = 1 - (step - tinit) / (total_step - tfinal - tinit)
        budget = int((self.init_bgt - self.target_bgt) * (mul_coeff ** 3) + self.target_bgt)
        mask_ind = True if step % self.peft_config.deltaT == 0 else False
    return budget, mask_ind

Masking (mask_to_budget):

The masking process aggregates triplet importance scores across all adapted layers, computes a global threshold via torch.kthvalue, and zeros out singular values (lora_E entries) that fall below the threshold using masked_fill_.

Usage Example

Standard training loop with AdaLoRA:

from peft import AdaLoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Setup
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")

total_steps = 10000

config = AdaLoraConfig(
    init_r=12,
    target_r=4,
    tinit=200,
    tfinal=1000,
    deltaT=10,
    total_step=total_steps,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, config)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

# Training loop
for step, batch in enumerate(dataloader):
    # Forward pass
    outputs = model(**batch)
    loss = outputs.loss

    # Backward pass
    loss.backward()

    # Optimizer step
    optimizer.step()

    # CRITICAL: update_and_allocate must be called here
    # After backward (gradients available) and before zero_grad (gradients cleared)
    model.base_model.update_and_allocate(step)

    # Clear gradients
    optimizer.zero_grad()

    if step >= total_steps:
        break

Monitoring rank allocation during training:

for step, batch in enumerate(dataloader):
    outputs = model(**batch)
    loss = outputs.loss
    loss.backward()

    optimizer.step()
    model.base_model.update_and_allocate(step)
    optimizer.zero_grad()

    # Log the current rank pattern periodically
    if step % 500 == 0:
        rank_pattern = model.peft_config["default"].rank_pattern
        if rank_pattern:
            for name, mask in rank_pattern.items():
                active_rank = sum(mask)
                total_rank = len(mask)
                print(f"Step {step} | {name}: rank {active_rank}/{total_rank}")

Edge Cases and Notes

  • Call ordering is critical: The method must be called after loss.backward() (so gradients are available for importance scoring) and before optimizer.zero_grad() (so gradients are not erased before being read). Calling in the wrong order will produce zero importance scores and incorrect rank allocation.
  • DeepSpeed compatibility: When DeepSpeed is detected, the importance update uses deepspeed.utils.safe_get_full_grad(p) instead of p.grad to correctly retrieve gradients in distributed training settings with ZeRO optimization.
  • Global step tracking: The global_step must be a monotonically increasing integer starting from 0. Passing incorrect step values (e.g., epoch-level counters instead of step-level counters) will produce incorrect budget scheduling.
  • Mask application in final phase: During the final fine-tuning phase, the method applies the saved rank_pattern at every step using mask_using_rank_pattern. This is necessary because the optimizer may update the masked singular values (they still have gradients from the adapter parameters), so the mask must be reapplied after each optimizer step.
  • rank_pattern serialization: The rank_pattern stored in the config uses a dictionary mapping parameter names to boolean lists. When the adapter name is truncated from parameter names (e.g., during checkpoint loading), the mask application code handles this by checking and stripping the adapter name suffix.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment