Implementation:Huggingface Diffusers Enable Model Cpu Offload

Knowledge Sources	Diffusers Diffusers Docs
Domains	Diffusion_Models, Memory_Management, GPU_Optimization
Last Updated	2026-02-13 21:00 GMT

Overview

Concrete tool for enabling model-level CPU offloading on a diffusion pipeline to reduce GPU memory usage provided by the Diffusers library.

Description

enable_model_cpu_offload is an instance method on DiffusionPipeline that configures automatic CPU-GPU transfer hooks for all model components. When called, it first moves all components to CPU and clears the GPU cache. It then iterates through the pipeline's model_cpu_offload_seq attribute (a string like "text_encoder->text_encoder_2->unet->vae") and registers chained cpu_offload_with_hook hooks from the Accelerate library on each component. This ensures that only the currently active model resides on the GPU at any time.

Components not listed in the offload sequence but present in the pipeline are handled separately: those in the _exclude_from_cpu_offload set are placed directly on the GPU device, while all others receive their own offload hooks. The method also handles edge cases such as models loaded with 8-bit bitsandbytes quantization (which are already on GPU) and pipelines with active device maps.

Usage

Call enable_model_cpu_offload() on a pipeline instance after loading it and before running inference. This is a single-line memory optimization that requires the accelerate library (version 0.17.0 or higher). Do not combine it with device_map or manual .to(device) calls on the pipeline.

Code Reference

Source Location

Repository: diffusers
File: src/diffusers/pipelines/pipeline_utils.py
Lines: 1174-1270

Signature

def enable_model_cpu_offload(
    self,
    gpu_id: int | None = None,
    device: torch.device | str = None,
) -> None:

Import

from diffusers import DiffusionPipeline
# enable_model_cpu_offload is an instance method on pipeline objects

I/O Contract

Inputs

Name	Type	Required	Description
gpu_id	`int` or `None`	No	The ID of the GPU to use for inference. If not specified, defaults to 0. Cannot be used together with a device index in the `device` parameter.
device	`torch.device` or `str` or `None`	No	The PyTorch device type for the accelerator (e.g., `"cuda"`, `"mps"`). If `None`, automatically detects the available accelerator. Must not be `"cpu"`.

Outputs

Name	Type	Description
(none)	`None`	This method modifies the pipeline in-place by registering CPU offload hooks on all model components. It does not return a value.

Usage Examples

Basic Usage

from diffusers import DiffusionPipeline
import torch

# Load a large pipeline that might not fit in GPU memory
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)

# Enable model-level CPU offloading (single line optimization)
pipe.enable_model_cpu_offload()

# Run inference as usual - offloading is handled transparently
image = pipe("A serene landscape with rolling hills").images[0]
image.save("landscape.png")

Specifying a GPU

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
)

# Use a specific GPU for offloading
pipe.enable_model_cpu_offload(gpu_id=1)

image = pipe("A futuristic city skyline at night").images[0]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment