Principle:AUTOMATIC1111 Stable diffusion webui Network weight patching
| Knowledge Sources | |
|---|---|
| Domains | Stable Diffusion, LoRA, Monkey-Patching, Weight Injection, PyTorch |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Network weight patching is the technique of modifying a pretrained model's weight tensors at runtime by injecting low-rank adaptation deltas computed from loaded network modules, using monkey-patched forward methods to trigger weight application lazily on first use.
Description
Stable Diffusion WebUI implements LoRA weight application through a two-layer mechanism:
Layer 1 -- Monkey-patching forward methods: The LoraPatches class replaces the forward() method of all torch.nn.Linear, torch.nn.Conv2d, torch.nn.GroupNorm, torch.nn.LayerNorm, and torch.nn.MultiheadAttention modules in PyTorch with custom versions. These replacements call the weight application function before delegating to the original forward method.
Layer 2 -- Lazy weight modification: When a patched layer's forward method is called for the first time after networks change, the weight application function:
- Checks whether the current set of loaded networks matches the set already applied (via
network_current_names) - If different, restores original weights from backup
- Iterates over all loaded networks, looking up the corresponding
NetworkModulefor this layer - Calls
module.calc_updown(weight)to compute the weight delta - Adds the delta to the weight tensor:
weight += updown - Records the current network set to avoid recomputation
This lazy approach ensures that weight modification only happens when necessary and is automatically triggered by the normal model execution flow.
Usage
Weight patching is the default mode of LoRA application in the WebUI. It is activated whenever the LoraPatches class is instantiated (typically at extension initialization) and remains active for the lifetime of the application. An alternative "functional" mode exists (lora_functional option) that computes LoRA contributions during the forward pass without modifying weights, but it is slower for multiple stacked networks.
Theoretical Basis
Weight Modification Formula
For a given model layer with original weight W, the patched weight is computed as:
W_patched = W_original + sum_i( module_i.calc_updown(W_original) )
For a standard LoRA module, calc_updown computes:
calc_updown(W):
updown = up_weight @ down_weight # matrix product: (d, r) @ (r, k) = (d, k)
updown = updown * (alpha / rank) # scale by alpha/rank ratio
updown = updown * multiplier # scale by user-specified weight
return updown, ex_bias
Where the multiplier is either te_multiplier (for text encoder layers) or unet_multiplier (for UNet layers), determined by checking whether the layer name starts with "transformer".
Monkey-Patching Architecture
The patching system uses Python's module-level method replacement via patches.patch():
PATCH SETUP:
torch.nn.Linear.forward -> network_Linear_forward
torch.nn.Conv2d.forward -> network_Conv2d_forward
torch.nn.GroupNorm.forward -> network_GroupNorm_forward
torch.nn.LayerNorm.forward -> network_LayerNorm_forward
torch.nn.MultiheadAttention.forward -> network_MultiheadAttention_forward
EACH PATCHED FORWARD(self, input):
network_apply_weights(self) # lazy weight modification
return original_forward(self, input) # original computation
Additionally, _load_from_state_dict methods are patched to reset cached weights when model state is reloaded, ensuring the weight application system does not use stale cached state.
Change Detection
The system tracks which networks are currently applied to each layer via a network_current_names attribute stored on the module:
wanted_names = tuple((net.name, net.te_multiplier, net.unet_multiplier, net.dyn_dim)
for net in loaded_networks)
if self.network_current_names != wanted_names:
restore_from_backup(self)
for each network:
apply delta to self.weight
self.network_current_names = wanted_names
This ensures that weight modification is skipped entirely when the same set of networks with the same multipliers is already applied, making repeated generation calls with the same prompt configuration efficient.
MultiheadAttention Handling
For torch.nn.MultiheadAttention layers, the weight is stored as a combined in_proj_weight tensor containing Q, K, and V projections concatenated. The system splits this into three chunks, applies separate network modules for each projection, and recombines:
qw, kw, vw = self.in_proj_weight.chunk(3, dim=0)
updown_q = module_q.calc_updown(qw)
updown_k = module_k.calc_updown(kw)
updown_v = module_v.calc_updown(vw)
updown_qkv = vstack([updown_q, updown_k, updown_v])
self.in_proj_weight += updown_qkv
self.out_proj.weight += module_out.calc_updown(self.out_proj.weight)