Implementation:Huggingface Diffusers ControlNetModel From Pretrained

Property	Value
Implementation Name	ControlNetModel.from_pretrained
Type	API Doc
Workflow	ControlNet_Guided_Generation
Related Principle	Huggingface_Diffusers_ControlNet_Architecture
Source File	`src/diffusers/models/controlnets/controlnet.py`
Lines	L72-L206 (ControlNetConditioningEmbedding + ControlNetModel.__init__)
Status	Active
Implements	Principle:Huggingface_Diffusers_ControlNet_Architecture

API Signature

ControlNetModel.from_pretrained(
    pretrained_model_name_or_path: str | os.PathLike,
    **kwargs,
) -> ControlNetModel

Class: ControlNetModel

Import:

from diffusers import ControlNetModel, StableDiffusionControlNetPipeline

Constructor Parameters (ControlNetModel.init)

Parameter	Type	Default	Description
`in_channels`	`int`	`4`	Number of channels in the input sample (latent channels).
`conditioning_channels`	`int`	`3`	Number of channels in the conditioning image (typically RGB=3).
`flip_sin_to_cos`	`bool`	`True`	Whether to flip sin to cos in the time embedding.
`freq_shift`	`int`	`0`	Frequency shift for the time embedding.
`down_block_types`	`tuple[str, ...]`	`("CrossAttnDownBlock2D", "CrossAttnDownBlock2D", "CrossAttnDownBlock2D", "DownBlock2D")`	Types of down blocks to use, mirroring the UNet encoder.
`mid_block_type`	None	`"UNetMidBlock2DCrossAttn"`	Type of mid block.
`block_out_channels`	`tuple[int, ...]`	`(320, 640, 1280, 1280)`	Output channels for each block.
`layers_per_block`	`int`	`2`	Number of ResNet layers per down block.
`cross_attention_dim`	`int`	`1280`	Dimension of cross attention features (text embedding dim).
`attention_head_dim`	tuple[int, ...]	`8`	Dimension of attention heads.
`conditioning_embedding_out_channels`	`tuple[int, ...]`	`(16, 32, 96, 256)`	Channel progression in the conditioning embedding network.
`controlnet_conditioning_channel_order`	`str`	`"rgb"`	Channel order of conditioning image (`"rgb"` or `"bgr"`).
`global_pool_conditions`	`bool`	`False`	Whether to global-average-pool conditioning features (enables guess mode behavior).

from_pretrained Parameters

Parameter	Type	Description
`pretrained_model_name_or_path`	os.PathLike	Hub model ID (e.g., `"lllyasviel/sd-controlnet-canny"`) or local directory path.
`torch_dtype`	`torch.dtype`	Data type for the model weights (e.g., `torch.float16`).
`variant`	`str`	Weight variant to load (e.g., `"fp16"`).
`subfolder`	`str`	Subfolder within the repo containing the model.
`cache_dir`	os.PathLike	Directory for caching downloaded models.
`use_safetensors`	`bool`	Whether to prefer safetensors format.

Return Value

Type	Description
`ControlNetModel`	A fully initialized ControlNet model with pretrained weights loaded.

Architecture Components

The ControlNetModel contains the following key components:

Component	Type	Purpose
`controlnet_cond_embedding`	`ControlNetConditioningEmbedding`	Converts conditioning images to latent-space feature maps
`conv_in`	`nn.Conv2d`	Initial convolution matching UNet's conv_in
`time_proj`	`Timesteps`	Timestep sinusoidal projection
`time_embedding`	`TimestepEmbedding`	MLP to produce timestep embeddings
`down_blocks`	`nn.ModuleList`	Trainable copy of UNet encoder down blocks
`mid_block`	`UNetMidBlock2DCrossAttn`	Trainable copy of UNet mid block
`controlnet_down_blocks`	`nn.ModuleList`	Zero-initialized 1x1 convolutions for each down block output
`controlnet_mid_block`	`nn.Conv2d`	Zero-initialized 1x1 convolution for mid block output

ControlNetConditioningEmbedding

class ControlNetConditioningEmbedding(nn.Module):
    def __init__(
        self,
        conditioning_embedding_channels: int,
        conditioning_channels: int = 3,
        block_out_channels: tuple[int, ...] = (16, 32, 96, 256),
    ):
        super().__init__()
        self.conv_in = nn.Conv2d(conditioning_channels, block_out_channels[0], kernel_size=3, padding=1)
        self.blocks = nn.ModuleList([])
        for i in range(len(block_out_channels) - 1):
            channel_in = block_out_channels[i]
            channel_out = block_out_channels[i + 1]
            self.blocks.append(nn.Conv2d(channel_in, channel_in, kernel_size=3, padding=1))
            self.blocks.append(nn.Conv2d(channel_in, channel_out, kernel_size=3, padding=1, stride=2))
        self.conv_out = zero_module(
            nn.Conv2d(block_out_channels[-1], conditioning_embedding_channels, kernel_size=3, padding=1)
        )

    def forward(self, conditioning):
        embedding = self.conv_in(conditioning)
        embedding = F.silu(embedding)
        for block in self.blocks:
            embedding = block(embedding)
            embedding = F.silu(embedding)
        embedding = self.conv_out(embedding)
        return embedding

Source: src/diffusers/models/controlnets/controlnet.py, lines 65-107.

Usage Examples

Loading a Pretrained ControlNet

import torch
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline

# Load a Canny ControlNet
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-canny",
    torch_dtype=torch.float16,
)

# Create the pipeline with the ControlNet
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()

Loading Multiple ControlNets

from diffusers import ControlNetModel, StableDiffusionControlNetPipeline

# Load multiple ControlNets for multi-modal conditioning
controlnet_canny = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16
)
controlnet_depth = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-depth", torch_dtype=torch.float16
)

# Pass as a list -- automatically wrapped in MultiControlNetModel
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=[controlnet_canny, controlnet_depth],
    torch_dtype=torch.float16,
)

Creating ControlNet from an Existing UNet

from diffusers import UNet2DConditionModel, ControlNetModel

# Load a UNet
unet = UNet2DConditionModel.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet"
)

# Create ControlNet from UNet (useful for training)
controlnet = ControlNetModel.from_unet(
    unet,
    conditioning_channels=3,
    controlnet_conditioning_channel_order="rgb",
)

Available Pretrained Models

Model ID	Conditioning Type	Base Model
`lllyasviel/sd-controlnet-canny`	Canny edges	SD 1.5
`lllyasviel/sd-controlnet-depth`	Depth maps	SD 1.5
`lllyasviel/sd-controlnet-openpose`	Pose skeletons	SD 1.5
`lllyasviel/sd-controlnet-seg`	Segmentation maps	SD 1.5
`lllyasviel/sd-controlnet-scribble`	Scribbles / sketches	SD 1.5
`lllyasviel/sd-controlnet-normal`	Normal maps	SD 1.5
`lllyasviel/sd-controlnet-mlsd`	Straight lines (M-LSD)	SD 1.5
`lllyasviel/sd-controlnet-hed`	Soft edges (HED)	SD 1.5

Notes

The from_pretrained method is inherited from ModelMixin and handles downloading, caching, and weight loading.
ControlNet supports gradient checkpointing via _supports_gradient_checkpointing = True for memory-efficient training.
The zero_module() helper function initializes all parameters of a module to zero, which is essential for the zero convolution pattern.

Related Pages

Huggingface_Diffusers_ControlNet_Architecture -- Principle: the architectural design behind ControlNet
Huggingface_Diffusers_ControlNetModel_Forward -- The forward pass of the loaded model
Huggingface_Diffusers_ControlNet_Pipeline_Call -- The pipeline that orchestrates ControlNet inference

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment