Principle:Intel Ipex llm LISA Dynamic Layer Training
| Knowledge Sources | |
|---|---|
| Domains | Finetuning, Memory_Efficient_Training |
| Last Updated | 2026-02-09 04:00 GMT |
Overview
Training technique that dynamically activates different subsets of model layers during training to reduce memory requirements while approximating full fine-tuning.
Description
LISA (Layer-wise Integrated Sensitivity-based Adaptation) selectively trains a small subset of model layers at each training step, rotating which layers are active at configurable intervals. By training different layers at different times, the method achieves coverage of the entire model over the course of training while never requiring gradients for all layers simultaneously. This significantly reduces peak memory usage compared to full fine-tuning.
Usage
Use this principle as an alternative to LoRA when full-parameter quality is needed but memory is limited. LISA provides a middle ground: it updates all original parameters (unlike LoRA which adds new parameters) but does so incrementally to reduce memory.
Theoretical Basis
At each training step, only out of total layers have trainable parameters. Every steps, a new random subset of layers is selected:
Pseudo-code Logic:
# Abstract LISA algorithm
for step in training:
if step % lisa_interval_steps == 0:
active_layers = random_select(all_layers, k=lisa_activated_layers)
freeze_all_layers(model)
unfreeze(model, active_layers)
loss = forward(model, batch)
loss.backward() # Gradients only for active layers
optimizer.step()