Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:AUTOMATIC1111 Stable diffusion webui Hypernetwork deployment

From Leeroopedia


Knowledge Sources
Domains Deep Learning, Stable Diffusion, Model Deployment
Last Updated 2026-02-08 00:00 GMT

Overview

Hypernetwork deployment is the process of saving trained hypernetwork weights to a portable checkpoint file, loading them at inference time, and applying the learned cross-attention transformations during image generation without modifying the base diffusion model.

Description

After training, a hypernetwork must be serialized, distributed, and applied at inference time. The deployment pipeline involves three stages:

Saving:

  • The trained hypernetwork state (layer weights, metadata, training step) is serialized to a .pt file using PyTorch's torch.save().
  • Optimizer state can optionally be saved to a separate .pt.optim file for training resumption.
  • Metadata includes the layer structure, activation function, weight initialization, dropout configuration, and the Stable Diffusion checkpoint used during training.

Loading:

  • A .pt file is loaded and its metadata is parsed to reconstruct the correct layer structure, activation functions, and dropout configuration.
  • HypernetworkModule instances are created for each stored attention dimension, initialized with the saved state dictionaries.
  • The optional optimizer state file is loaded if its hash matches the hypernetwork weights, ensuring consistency.

Applying during inference:

  • During image generation, one or more hypernetworks are loaded into shared.loaded_hypernetworks.
  • The cross-attention forward function is hijacked to call apply_hypernetworks() before computing K and V projections.
  • Each hypernetwork transforms the context through its paired modules: one for K and one for V.
  • Multiple hypernetworks can be stacked, each applying its transformation sequentially.
  • A multiplier controls the strength of each hypernetwork's effect, scaling the residual term.

Usage

Use hypernetwork deployment when:

  • Saving a trained hypernetwork for distribution or archival.
  • Loading a hypernetwork at inference time to modify generated image style or content.
  • Combining multiple hypernetworks with different multipliers for blended effects.

Theoretical Basis

Residual Cross-Attention Modification

At inference time, the hypernetwork applies a residual transformation to the cross-attention context:

# Original cross-attention (without hypernetwork):
K = W_k * context
V = W_v * context

# With hypernetwork applied:
context_k = context + MLP_k(context) * multiplier
context_v = context + MLP_v(context) * multiplier
K = W_k * context_k
V = W_v * context_v

The multiplier parameter (set via set_multiplier()) allows controlling the strength of the hypernetwork's effect without retraining:

  • multiplier = 0.0: No effect (identity transformation)
  • multiplier = 1.0: Full trained effect (default)
  • multiplier > 1.0: Amplified effect (may cause artifacts)

During training, the multiplier is always 1.0 to ensure consistent gradient computation. The adjustable multiplier is an inference-only feature.

Stacking Multiple Hypernetworks

Multiple hypernetworks can be applied sequentially:

context_k = context
context_v = context
for each hypernetwork in loaded_hypernetworks:
    context_k = hypernetwork.module_k(context_k)  # residual applied
    context_v = hypernetwork.module_v(context_v)  # residual applied
K = W_k * context_k
V = W_v * context_v

Each hypernetwork applies its own residual transformation to the already-transformed context, enabling compositional style control.

Save Format Specification

The .pt checkpoint file contains a dictionary with:

{
    <dim>: (k_module_state_dict, v_module_state_dict),  # for each attention dimension
    'step': int,                    # training step count
    'name': str,                    # hypernetwork name
    'layer_structure': list,        # e.g., [1, 2, 1]
    'activation_func': str,         # e.g., "relu"
    'is_layer_norm': bool,          # whether LayerNorm is used
    'weight_initialization': str,   # e.g., "Normal"
    'sd_checkpoint': str,           # base model hash
    'sd_checkpoint_name': str,      # base model name
    'activate_output': bool,        # whether last layer has activation
    'use_dropout': bool,            # whether dropout is enabled
    'dropout_structure': list,      # per-layer dropout probabilities
    'last_layer_dropout': bool,     # dropout on the last layer
    'optional_info': str or None,   # user-provided description
}

The optional .pt.optim file contains:

{
    'optimizer_name': str,          # e.g., "AdamW"
    'hash': str,                    # SHA256 short hash of the hypernetwork file
    'optimizer_state_dict': dict,   # PyTorch optimizer state
}

Hash Verification

When loading an optimizer state, the system computes a SHA256 hash of the hypernetwork .pt file and compares it against the hash stored in the .optim file. This ensures that the optimizer state corresponds to the current model weights and has not been corrupted or mismatched through file replacement.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment