Implementation:Huggingface Diffusers Enable Model Cpu Offload
| Knowledge Sources | |
|---|---|
| Domains | Diffusion_Models, Memory_Management, GPU_Optimization |
| Last Updated | 2026-02-13 21:00 GMT |
Overview
Concrete tool for enabling model-level CPU offloading on a diffusion pipeline to reduce GPU memory usage provided by the Diffusers library.
Description
enable_model_cpu_offload is an instance method on DiffusionPipeline that configures automatic CPU-GPU transfer hooks for all model components. When called, it first moves all components to CPU and clears the GPU cache. It then iterates through the pipeline's model_cpu_offload_seq attribute (a string like "text_encoder->text_encoder_2->unet->vae") and registers chained cpu_offload_with_hook hooks from the Accelerate library on each component. This ensures that only the currently active model resides on the GPU at any time.
Components not listed in the offload sequence but present in the pipeline are handled separately: those in the _exclude_from_cpu_offload set are placed directly on the GPU device, while all others receive their own offload hooks. The method also handles edge cases such as models loaded with 8-bit bitsandbytes quantization (which are already on GPU) and pipelines with active device maps.
Usage
Call enable_model_cpu_offload() on a pipeline instance after loading it and before running inference. This is a single-line memory optimization that requires the accelerate library (version 0.17.0 or higher). Do not combine it with device_map or manual .to(device) calls on the pipeline.
Code Reference
Source Location
- Repository: diffusers
- File:
src/diffusers/pipelines/pipeline_utils.py - Lines: 1174-1270
Signature
def enable_model_cpu_offload(
self,
gpu_id: int | None = None,
device: torch.device | str = None,
) -> None:
Import
from diffusers import DiffusionPipeline
# enable_model_cpu_offload is an instance method on pipeline objects
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| gpu_id | int or None |
No | The ID of the GPU to use for inference. If not specified, defaults to 0. Cannot be used together with a device index in the device parameter.
|
| device | torch.device or str or None |
No | The PyTorch device type for the accelerator (e.g., "cuda", "mps"). If None, automatically detects the available accelerator. Must not be "cpu".
|
Outputs
| Name | Type | Description |
|---|---|---|
| (none) | None |
This method modifies the pipeline in-place by registering CPU offload hooks on all model components. It does not return a value. |
Usage Examples
Basic Usage
from diffusers import DiffusionPipeline
import torch
# Load a large pipeline that might not fit in GPU memory
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
)
# Enable model-level CPU offloading (single line optimization)
pipe.enable_model_cpu_offload()
# Run inference as usual - offloading is handled transparently
image = pipe("A serene landscape with rolling hills").images[0]
image.save("landscape.png")
Specifying a GPU
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
)
# Use a specific GPU for offloading
pipe.enable_model_cpu_offload(gpu_id=1)
image = pipe("A futuristic city skyline at night").images[0]