Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:AUTOMATIC1111 Stable diffusion webui Network weight patching

From Leeroopedia


Knowledge Sources
Domains Stable Diffusion, LoRA, Monkey-Patching, Weight Injection, PyTorch
Last Updated 2026-02-08 00:00 GMT

Overview

Network weight patching is the technique of modifying a pretrained model's weight tensors at runtime by injecting low-rank adaptation deltas computed from loaded network modules, using monkey-patched forward methods to trigger weight application lazily on first use.

Description

Stable Diffusion WebUI implements LoRA weight application through a two-layer mechanism:

Layer 1 -- Monkey-patching forward methods: The LoraPatches class replaces the forward() method of all torch.nn.Linear, torch.nn.Conv2d, torch.nn.GroupNorm, torch.nn.LayerNorm, and torch.nn.MultiheadAttention modules in PyTorch with custom versions. These replacements call the weight application function before delegating to the original forward method.

Layer 2 -- Lazy weight modification: When a patched layer's forward method is called for the first time after networks change, the weight application function:

  1. Checks whether the current set of loaded networks matches the set already applied (via network_current_names)
  2. If different, restores original weights from backup
  3. Iterates over all loaded networks, looking up the corresponding NetworkModule for this layer
  4. Calls module.calc_updown(weight) to compute the weight delta
  5. Adds the delta to the weight tensor: weight += updown
  6. Records the current network set to avoid recomputation

This lazy approach ensures that weight modification only happens when necessary and is automatically triggered by the normal model execution flow.

Usage

Weight patching is the default mode of LoRA application in the WebUI. It is activated whenever the LoraPatches class is instantiated (typically at extension initialization) and remains active for the lifetime of the application. An alternative "functional" mode exists (lora_functional option) that computes LoRA contributions during the forward pass without modifying weights, but it is slower for multiple stacked networks.

Theoretical Basis

Weight Modification Formula

For a given model layer with original weight W, the patched weight is computed as:

W_patched = W_original + sum_i( module_i.calc_updown(W_original) )

For a standard LoRA module, calc_updown computes:

calc_updown(W):
    updown = up_weight @ down_weight    # matrix product: (d, r) @ (r, k) = (d, k)
    updown = updown * (alpha / rank)     # scale by alpha/rank ratio
    updown = updown * multiplier         # scale by user-specified weight
    return updown, ex_bias

Where the multiplier is either te_multiplier (for text encoder layers) or unet_multiplier (for UNet layers), determined by checking whether the layer name starts with "transformer".

Monkey-Patching Architecture

The patching system uses Python's module-level method replacement via patches.patch():

PATCH SETUP:
    torch.nn.Linear.forward        -> network_Linear_forward
    torch.nn.Conv2d.forward         -> network_Conv2d_forward
    torch.nn.GroupNorm.forward      -> network_GroupNorm_forward
    torch.nn.LayerNorm.forward      -> network_LayerNorm_forward
    torch.nn.MultiheadAttention.forward -> network_MultiheadAttention_forward

EACH PATCHED FORWARD(self, input):
    network_apply_weights(self)    # lazy weight modification
    return original_forward(self, input)  # original computation

Additionally, _load_from_state_dict methods are patched to reset cached weights when model state is reloaded, ensuring the weight application system does not use stale cached state.

Change Detection

The system tracks which networks are currently applied to each layer via a network_current_names attribute stored on the module:

wanted_names = tuple((net.name, net.te_multiplier, net.unet_multiplier, net.dyn_dim)
                     for net in loaded_networks)

if self.network_current_names != wanted_names:
    restore_from_backup(self)
    for each network:
        apply delta to self.weight
    self.network_current_names = wanted_names

This ensures that weight modification is skipped entirely when the same set of networks with the same multipliers is already applied, making repeated generation calls with the same prompt configuration efficient.

MultiheadAttention Handling

For torch.nn.MultiheadAttention layers, the weight is stored as a combined in_proj_weight tensor containing Q, K, and V projections concatenated. The system splits this into three chunks, applies separate network modules for each projection, and recombines:

qw, kw, vw = self.in_proj_weight.chunk(3, dim=0)
updown_q = module_q.calc_updown(qw)
updown_k = module_k.calc_updown(kw)
updown_v = module_v.calc_updown(vw)
updown_qkv = vstack([updown_q, updown_k, updown_v])
self.in_proj_weight += updown_qkv
self.out_proj.weight += module_out.calc_updown(self.out_proj.weight)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment