Implementation:Huggingface Diffusers ControlNetModel From Pretrained
Appearance
| Property | Value |
|---|---|
| Implementation Name | ControlNetModel.from_pretrained |
| Type | API Doc |
| Workflow | ControlNet_Guided_Generation |
| Related Principle | Huggingface_Diffusers_ControlNet_Architecture |
| Source File | src/diffusers/models/controlnets/controlnet.py
|
| Lines | L72-L206 (ControlNetConditioningEmbedding + ControlNetModel.__init__) |
| Status | Active |
| Implements | Principle:Huggingface_Diffusers_ControlNet_Architecture |
API Signature
ControlNetModel.from_pretrained(
pretrained_model_name_or_path: str | os.PathLike,
**kwargs,
) -> ControlNetModel
Class: ControlNetModel
Import:
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline
Constructor Parameters (ControlNetModel.__init__)
| Parameter | Type | Default | Description |
|---|---|---|---|
in_channels |
int |
4 |
Number of channels in the input sample (latent channels). |
conditioning_channels |
int |
3 |
Number of channels in the conditioning image (typically RGB=3). |
flip_sin_to_cos |
bool |
True |
Whether to flip sin to cos in the time embedding. |
freq_shift |
int |
0 |
Frequency shift for the time embedding. |
down_block_types |
tuple[str, ...] |
("CrossAttnDownBlock2D", "CrossAttnDownBlock2D", "CrossAttnDownBlock2D", "DownBlock2D") |
Types of down blocks to use, mirroring the UNet encoder. |
mid_block_type |
None | "UNetMidBlock2DCrossAttn" |
Type of mid block. |
block_out_channels |
tuple[int, ...] |
(320, 640, 1280, 1280) |
Output channels for each block. |
layers_per_block |
int |
2 |
Number of ResNet layers per down block. |
cross_attention_dim |
int |
1280 |
Dimension of cross attention features (text embedding dim). |
attention_head_dim |
tuple[int, ...] | 8 |
Dimension of attention heads. |
conditioning_embedding_out_channels |
tuple[int, ...] |
(16, 32, 96, 256) |
Channel progression in the conditioning embedding network. |
controlnet_conditioning_channel_order |
str |
"rgb" |
Channel order of conditioning image ("rgb" or "bgr").
|
global_pool_conditions |
bool |
False |
Whether to global-average-pool conditioning features (enables guess mode behavior). |
from_pretrained Parameters
| Parameter | Type | Description |
|---|---|---|
pretrained_model_name_or_path |
os.PathLike | Hub model ID (e.g., "lllyasviel/sd-controlnet-canny") or local directory path.
|
torch_dtype |
torch.dtype |
Data type for the model weights (e.g., torch.float16).
|
variant |
str |
Weight variant to load (e.g., "fp16").
|
subfolder |
str |
Subfolder within the repo containing the model. |
cache_dir |
os.PathLike | Directory for caching downloaded models. |
use_safetensors |
bool |
Whether to prefer safetensors format. |
Return Value
| Type | Description |
|---|---|
ControlNetModel |
A fully initialized ControlNet model with pretrained weights loaded. |
Architecture Components
The ControlNetModel contains the following key components:
| Component | Type | Purpose |
|---|---|---|
controlnet_cond_embedding |
ControlNetConditioningEmbedding |
Converts conditioning images to latent-space feature maps |
conv_in |
nn.Conv2d |
Initial convolution matching UNet's conv_in |
time_proj |
Timesteps |
Timestep sinusoidal projection |
time_embedding |
TimestepEmbedding |
MLP to produce timestep embeddings |
down_blocks |
nn.ModuleList |
Trainable copy of UNet encoder down blocks |
mid_block |
UNetMidBlock2DCrossAttn |
Trainable copy of UNet mid block |
controlnet_down_blocks |
nn.ModuleList |
Zero-initialized 1x1 convolutions for each down block output |
controlnet_mid_block |
nn.Conv2d |
Zero-initialized 1x1 convolution for mid block output |
ControlNetConditioningEmbedding
class ControlNetConditioningEmbedding(nn.Module):
def __init__(
self,
conditioning_embedding_channels: int,
conditioning_channels: int = 3,
block_out_channels: tuple[int, ...] = (16, 32, 96, 256),
):
super().__init__()
self.conv_in = nn.Conv2d(conditioning_channels, block_out_channels[0], kernel_size=3, padding=1)
self.blocks = nn.ModuleList([])
for i in range(len(block_out_channels) - 1):
channel_in = block_out_channels[i]
channel_out = block_out_channels[i + 1]
self.blocks.append(nn.Conv2d(channel_in, channel_in, kernel_size=3, padding=1))
self.blocks.append(nn.Conv2d(channel_in, channel_out, kernel_size=3, padding=1, stride=2))
self.conv_out = zero_module(
nn.Conv2d(block_out_channels[-1], conditioning_embedding_channels, kernel_size=3, padding=1)
)
def forward(self, conditioning):
embedding = self.conv_in(conditioning)
embedding = F.silu(embedding)
for block in self.blocks:
embedding = block(embedding)
embedding = F.silu(embedding)
embedding = self.conv_out(embedding)
return embedding
Source: src/diffusers/models/controlnets/controlnet.py, lines 65-107.
Usage Examples
Loading a Pretrained ControlNet
import torch
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline
# Load a Canny ControlNet
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/sd-controlnet-canny",
torch_dtype=torch.float16,
)
# Create the pipeline with the ControlNet
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()
Loading Multiple ControlNets
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline
# Load multiple ControlNets for multi-modal conditioning
controlnet_canny = ControlNetModel.from_pretrained(
"lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16
)
controlnet_depth = ControlNetModel.from_pretrained(
"lllyasviel/sd-controlnet-depth", torch_dtype=torch.float16
)
# Pass as a list -- automatically wrapped in MultiControlNetModel
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
controlnet=[controlnet_canny, controlnet_depth],
torch_dtype=torch.float16,
)
Creating ControlNet from an Existing UNet
from diffusers import UNet2DConditionModel, ControlNetModel
# Load a UNet
unet = UNet2DConditionModel.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet"
)
# Create ControlNet from UNet (useful for training)
controlnet = ControlNetModel.from_unet(
unet,
conditioning_channels=3,
controlnet_conditioning_channel_order="rgb",
)
Available Pretrained Models
| Model ID | Conditioning Type | Base Model |
|---|---|---|
lllyasviel/sd-controlnet-canny |
Canny edges | SD 1.5 |
lllyasviel/sd-controlnet-depth |
Depth maps | SD 1.5 |
lllyasviel/sd-controlnet-openpose |
Pose skeletons | SD 1.5 |
lllyasviel/sd-controlnet-seg |
Segmentation maps | SD 1.5 |
lllyasviel/sd-controlnet-scribble |
Scribbles / sketches | SD 1.5 |
lllyasviel/sd-controlnet-normal |
Normal maps | SD 1.5 |
lllyasviel/sd-controlnet-mlsd |
Straight lines (M-LSD) | SD 1.5 |
lllyasviel/sd-controlnet-hed |
Soft edges (HED) | SD 1.5 |
Notes
- The
from_pretrainedmethod is inherited fromModelMixinand handles downloading, caching, and weight loading. - ControlNet supports gradient checkpointing via
_supports_gradient_checkpointing = Truefor memory-efficient training. - The
zero_module()helper function initializes all parameters of a module to zero, which is essential for the zero convolution pattern.
Related Pages
- Huggingface_Diffusers_ControlNet_Architecture -- Principle: the architectural design behind ControlNet
- Huggingface_Diffusers_ControlNetModel_Forward -- The forward pass of the loaded model
- Huggingface_Diffusers_ControlNet_Pipeline_Call -- The pipeline that orchestrates ControlNet inference
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment