Principle:AUTOMATIC1111 Stable diffusion webui Hypernetwork deployment
| Knowledge Sources | |
|---|---|
| Domains | Deep Learning, Stable Diffusion, Model Deployment |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Hypernetwork deployment is the process of saving trained hypernetwork weights to a portable checkpoint file, loading them at inference time, and applying the learned cross-attention transformations during image generation without modifying the base diffusion model.
Description
After training, a hypernetwork must be serialized, distributed, and applied at inference time. The deployment pipeline involves three stages:
Saving:
- The trained hypernetwork state (layer weights, metadata, training step) is serialized to a
.ptfile using PyTorch'storch.save(). - Optimizer state can optionally be saved to a separate
.pt.optimfile for training resumption. - Metadata includes the layer structure, activation function, weight initialization, dropout configuration, and the Stable Diffusion checkpoint used during training.
Loading:
- A
.ptfile is loaded and its metadata is parsed to reconstruct the correct layer structure, activation functions, and dropout configuration. HypernetworkModuleinstances are created for each stored attention dimension, initialized with the saved state dictionaries.- The optional optimizer state file is loaded if its hash matches the hypernetwork weights, ensuring consistency.
Applying during inference:
- During image generation, one or more hypernetworks are loaded into
shared.loaded_hypernetworks. - The cross-attention forward function is hijacked to call
apply_hypernetworks()before computing K and V projections. - Each hypernetwork transforms the context through its paired modules: one for K and one for V.
- Multiple hypernetworks can be stacked, each applying its transformation sequentially.
- A multiplier controls the strength of each hypernetwork's effect, scaling the residual term.
Usage
Use hypernetwork deployment when:
- Saving a trained hypernetwork for distribution or archival.
- Loading a hypernetwork at inference time to modify generated image style or content.
- Combining multiple hypernetworks with different multipliers for blended effects.
Theoretical Basis
Residual Cross-Attention Modification
At inference time, the hypernetwork applies a residual transformation to the cross-attention context:
# Original cross-attention (without hypernetwork):
K = W_k * context
V = W_v * context
# With hypernetwork applied:
context_k = context + MLP_k(context) * multiplier
context_v = context + MLP_v(context) * multiplier
K = W_k * context_k
V = W_v * context_v
The multiplier parameter (set via set_multiplier()) allows controlling the strength of the hypernetwork's effect without retraining:
multiplier = 0.0: No effect (identity transformation)multiplier = 1.0: Full trained effect (default)multiplier > 1.0: Amplified effect (may cause artifacts)
During training, the multiplier is always 1.0 to ensure consistent gradient computation. The adjustable multiplier is an inference-only feature.
Stacking Multiple Hypernetworks
Multiple hypernetworks can be applied sequentially:
context_k = context
context_v = context
for each hypernetwork in loaded_hypernetworks:
context_k = hypernetwork.module_k(context_k) # residual applied
context_v = hypernetwork.module_v(context_v) # residual applied
K = W_k * context_k
V = W_v * context_v
Each hypernetwork applies its own residual transformation to the already-transformed context, enabling compositional style control.
Save Format Specification
The .pt checkpoint file contains a dictionary with:
{
<dim>: (k_module_state_dict, v_module_state_dict), # for each attention dimension
'step': int, # training step count
'name': str, # hypernetwork name
'layer_structure': list, # e.g., [1, 2, 1]
'activation_func': str, # e.g., "relu"
'is_layer_norm': bool, # whether LayerNorm is used
'weight_initialization': str, # e.g., "Normal"
'sd_checkpoint': str, # base model hash
'sd_checkpoint_name': str, # base model name
'activate_output': bool, # whether last layer has activation
'use_dropout': bool, # whether dropout is enabled
'dropout_structure': list, # per-layer dropout probabilities
'last_layer_dropout': bool, # dropout on the last layer
'optional_info': str or None, # user-provided description
}
The optional .pt.optim file contains:
{
'optimizer_name': str, # e.g., "AdamW"
'hash': str, # SHA256 short hash of the hypernetwork file
'optimizer_state_dict': dict, # PyTorch optimizer state
}
Hash Verification
When loading an optimizer state, the system computes a SHA256 hash of the hypernetwork .pt file and compares it against the hash stored in the .optim file. This ensures that the optimizer state corresponds to the current model weights and has not been corrupted or mismatched through file replacement.