Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Diffusers Enable Model Cpu Offload

From Leeroopedia
Knowledge Sources
Domains Diffusion_Models, Memory_Management, GPU_Optimization
Last Updated 2026-02-13 21:00 GMT

Overview

Concrete tool for enabling model-level CPU offloading on a diffusion pipeline to reduce GPU memory usage provided by the Diffusers library.

Description

enable_model_cpu_offload is an instance method on DiffusionPipeline that configures automatic CPU-GPU transfer hooks for all model components. When called, it first moves all components to CPU and clears the GPU cache. It then iterates through the pipeline's model_cpu_offload_seq attribute (a string like "text_encoder->text_encoder_2->unet->vae") and registers chained cpu_offload_with_hook hooks from the Accelerate library on each component. This ensures that only the currently active model resides on the GPU at any time.

Components not listed in the offload sequence but present in the pipeline are handled separately: those in the _exclude_from_cpu_offload set are placed directly on the GPU device, while all others receive their own offload hooks. The method also handles edge cases such as models loaded with 8-bit bitsandbytes quantization (which are already on GPU) and pipelines with active device maps.

Usage

Call enable_model_cpu_offload() on a pipeline instance after loading it and before running inference. This is a single-line memory optimization that requires the accelerate library (version 0.17.0 or higher). Do not combine it with device_map or manual .to(device) calls on the pipeline.

Code Reference

Source Location

  • Repository: diffusers
  • File: src/diffusers/pipelines/pipeline_utils.py
  • Lines: 1174-1270

Signature

def enable_model_cpu_offload(
    self,
    gpu_id: int | None = None,
    device: torch.device | str = None,
) -> None:

Import

from diffusers import DiffusionPipeline
# enable_model_cpu_offload is an instance method on pipeline objects

I/O Contract

Inputs

Name Type Required Description
gpu_id int or None No The ID of the GPU to use for inference. If not specified, defaults to 0. Cannot be used together with a device index in the device parameter.
device torch.device or str or None No The PyTorch device type for the accelerator (e.g., "cuda", "mps"). If None, automatically detects the available accelerator. Must not be "cpu".

Outputs

Name Type Description
(none) None This method modifies the pipeline in-place by registering CPU offload hooks on all model components. It does not return a value.

Usage Examples

Basic Usage

from diffusers import DiffusionPipeline
import torch

# Load a large pipeline that might not fit in GPU memory
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)

# Enable model-level CPU offloading (single line optimization)
pipe.enable_model_cpu_offload()

# Run inference as usual - offloading is handled transparently
image = pipe("A serene landscape with rolling hills").images[0]
image.save("landscape.png")

Specifying a GPU

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
)

# Use a specific GPU for offloading
pipe.enable_model_cpu_offload(gpu_id=1)

image = pipe("A futuristic city skyline at night").images[0]

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment