Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Diffusers ControlNetModel From Pretrained

From Leeroopedia
Property Value
Implementation Name ControlNetModel.from_pretrained
Type API Doc
Workflow ControlNet_Guided_Generation
Related Principle Huggingface_Diffusers_ControlNet_Architecture
Source File src/diffusers/models/controlnets/controlnet.py
Lines L72-L206 (ControlNetConditioningEmbedding + ControlNetModel.__init__)
Status Active
Implements Principle:Huggingface_Diffusers_ControlNet_Architecture

API Signature

ControlNetModel.from_pretrained(
    pretrained_model_name_or_path: str | os.PathLike,
    **kwargs,
) -> ControlNetModel

Class: ControlNetModel

Import:

from diffusers import ControlNetModel, StableDiffusionControlNetPipeline

Constructor Parameters (ControlNetModel.__init__)

Parameter Type Default Description
in_channels int 4 Number of channels in the input sample (latent channels).
conditioning_channels int 3 Number of channels in the conditioning image (typically RGB=3).
flip_sin_to_cos bool True Whether to flip sin to cos in the time embedding.
freq_shift int 0 Frequency shift for the time embedding.
down_block_types tuple[str, ...] ("CrossAttnDownBlock2D", "CrossAttnDownBlock2D", "CrossAttnDownBlock2D", "DownBlock2D") Types of down blocks to use, mirroring the UNet encoder.
mid_block_type None "UNetMidBlock2DCrossAttn" Type of mid block.
block_out_channels tuple[int, ...] (320, 640, 1280, 1280) Output channels for each block.
layers_per_block int 2 Number of ResNet layers per down block.
cross_attention_dim int 1280 Dimension of cross attention features (text embedding dim).
attention_head_dim tuple[int, ...] 8 Dimension of attention heads.
conditioning_embedding_out_channels tuple[int, ...] (16, 32, 96, 256) Channel progression in the conditioning embedding network.
controlnet_conditioning_channel_order str "rgb" Channel order of conditioning image ("rgb" or "bgr").
global_pool_conditions bool False Whether to global-average-pool conditioning features (enables guess mode behavior).

from_pretrained Parameters

Parameter Type Description
pretrained_model_name_or_path os.PathLike Hub model ID (e.g., "lllyasviel/sd-controlnet-canny") or local directory path.
torch_dtype torch.dtype Data type for the model weights (e.g., torch.float16).
variant str Weight variant to load (e.g., "fp16").
subfolder str Subfolder within the repo containing the model.
cache_dir os.PathLike Directory for caching downloaded models.
use_safetensors bool Whether to prefer safetensors format.

Return Value

Type Description
ControlNetModel A fully initialized ControlNet model with pretrained weights loaded.

Architecture Components

The ControlNetModel contains the following key components:

Component Type Purpose
controlnet_cond_embedding ControlNetConditioningEmbedding Converts conditioning images to latent-space feature maps
conv_in nn.Conv2d Initial convolution matching UNet's conv_in
time_proj Timesteps Timestep sinusoidal projection
time_embedding TimestepEmbedding MLP to produce timestep embeddings
down_blocks nn.ModuleList Trainable copy of UNet encoder down blocks
mid_block UNetMidBlock2DCrossAttn Trainable copy of UNet mid block
controlnet_down_blocks nn.ModuleList Zero-initialized 1x1 convolutions for each down block output
controlnet_mid_block nn.Conv2d Zero-initialized 1x1 convolution for mid block output

ControlNetConditioningEmbedding

class ControlNetConditioningEmbedding(nn.Module):
    def __init__(
        self,
        conditioning_embedding_channels: int,
        conditioning_channels: int = 3,
        block_out_channels: tuple[int, ...] = (16, 32, 96, 256),
    ):
        super().__init__()
        self.conv_in = nn.Conv2d(conditioning_channels, block_out_channels[0], kernel_size=3, padding=1)
        self.blocks = nn.ModuleList([])
        for i in range(len(block_out_channels) - 1):
            channel_in = block_out_channels[i]
            channel_out = block_out_channels[i + 1]
            self.blocks.append(nn.Conv2d(channel_in, channel_in, kernel_size=3, padding=1))
            self.blocks.append(nn.Conv2d(channel_in, channel_out, kernel_size=3, padding=1, stride=2))
        self.conv_out = zero_module(
            nn.Conv2d(block_out_channels[-1], conditioning_embedding_channels, kernel_size=3, padding=1)
        )

    def forward(self, conditioning):
        embedding = self.conv_in(conditioning)
        embedding = F.silu(embedding)
        for block in self.blocks:
            embedding = block(embedding)
            embedding = F.silu(embedding)
        embedding = self.conv_out(embedding)
        return embedding

Source: src/diffusers/models/controlnets/controlnet.py, lines 65-107.

Usage Examples

Loading a Pretrained ControlNet

import torch
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline

# Load a Canny ControlNet
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-canny",
    torch_dtype=torch.float16,
)

# Create the pipeline with the ControlNet
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()

Loading Multiple ControlNets

from diffusers import ControlNetModel, StableDiffusionControlNetPipeline

# Load multiple ControlNets for multi-modal conditioning
controlnet_canny = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16
)
controlnet_depth = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-depth", torch_dtype=torch.float16
)

# Pass as a list -- automatically wrapped in MultiControlNetModel
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=[controlnet_canny, controlnet_depth],
    torch_dtype=torch.float16,
)

Creating ControlNet from an Existing UNet

from diffusers import UNet2DConditionModel, ControlNetModel

# Load a UNet
unet = UNet2DConditionModel.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet"
)

# Create ControlNet from UNet (useful for training)
controlnet = ControlNetModel.from_unet(
    unet,
    conditioning_channels=3,
    controlnet_conditioning_channel_order="rgb",
)

Available Pretrained Models

Model ID Conditioning Type Base Model
lllyasviel/sd-controlnet-canny Canny edges SD 1.5
lllyasviel/sd-controlnet-depth Depth maps SD 1.5
lllyasviel/sd-controlnet-openpose Pose skeletons SD 1.5
lllyasviel/sd-controlnet-seg Segmentation maps SD 1.5
lllyasviel/sd-controlnet-scribble Scribbles / sketches SD 1.5
lllyasviel/sd-controlnet-normal Normal maps SD 1.5
lllyasviel/sd-controlnet-mlsd Straight lines (M-LSD) SD 1.5
lllyasviel/sd-controlnet-hed Soft edges (HED) SD 1.5

Notes

  • The from_pretrained method is inherited from ModelMixin and handles downloading, caching, and weight loading.
  • ControlNet supports gradient checkpointing via _supports_gradient_checkpointing = True for memory-efficient training.
  • The zero_module() helper function initializes all parameters of a module to zero, which is essential for the zero convolution pattern.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment