Implementation:Bentoml BentoML Framework Diffusers

Knowledge Sources	Bentoml_BentoML
Domains	ML Framework, Generative AI, Diffusion Models, Model Serialization
Last Updated	2026-02-13 15:00 GMT

Overview

The bentoml.diffusers module provides BentoML integration for HuggingFace Diffusers pipelines, enabling save, import, load, and serving of diffusion models such as Stable Diffusion.

Description

This module implements the BentoML framework adapter for the HuggingFace diffusers library. It supports saving and loading DiffusionPipeline instances with extensive configuration options for performance optimization and customization.

ModelOptions (attrs dataclass) captures pipeline configuration including:

Pipeline and scheduler class selection
Torch dtype, device map, and custom pipeline paths
Performance optimizations: xformers, attention slicing, CPU offloading, torch.compile
LoRA weight loading and textual inversion support
Low CPU memory usage mode and model variants

Key functions:

save_model(): Saves a DiffusionPipeline using save_pretrained() into the BentoML model store.
import_model(): Downloads a model from HuggingFace Hub or copies from a local directory, supporting version sync with hub commit hashes and variant selection.
load_model(): Loads a pipeline from the model store with full configuration of scheduler, device placement, optimizations, LoRA weights, and textual inversions.
get_runnable(): Creates a DiffusersRunnable that auto-selects GPU/CPU, applies optimizations, and supports runtime scheduler replacement and LoRA weight swapping.

Internal helpers handle LoRA argument parsing (file paths, HuggingFace repo IDs, or dictionaries), textual inversion loading, and class string-to-type resolution.

Usage

Use this module to save, import, load, and serve diffusion models (Stable Diffusion, SDXL, etc.) within BentoML services. Supports both saving locally trained pipelines and importing pre-trained models from HuggingFace Hub.

Code Reference

Source Location

Repository: Bentoml_BentoML
File: src/bentoml/_internal/frameworks/diffusers.py
Lines: 1-868

Signature

def get(tag_like: str | Tag) -> bentoml.Model: ...

def load_model(bento_model: str | Tag | bentoml.Model,
               device_id: str | torch.device | None = None,
               pipeline_class: str | type[DiffusionPipeline] = DiffusionPipeline,
               device_map: str | dict | None = None,
               custom_pipeline: str | None = None,
               scheduler_class: type[SchedulerMixin] | None = None,
               torch_dtype: str | torch.dtype | None = None,
               low_cpu_mem_usage: bool | None = None,
               enable_xformers: bool = False,
               enable_attention_slicing: int | str | None = None,
               enable_model_cpu_offload: bool | None = None,
               enable_sequential_cpu_offload: bool | None = None,
               enable_torch_compile: bool | None = None,
               variant: str | None = None,
               lora_weights: LoraOptionType | list[LoraOptionType] | None = None,
               textual_inversions: TextualInversionOptionType | list | None = None,
               load_pretrained_extra_kwargs: dict[str, Any] | None = None,
               ) -> diffusers.DiffusionPipeline: ...

def import_model(name: Tag | str,
                 model_name_or_path: str | os.PathLike[str],
                 *, proxies: dict | None = None,
                 revision: str = "main",
                 variant: str | None = None,
                 pipeline_class: str | type | None = None,
                 sync_with_hub_version: bool = False,
                 signatures: dict | None = None,
                 labels: dict | None = None,
                 metadata: dict | None = None,
                 ) -> bentoml.Model: ...

def save_model(name: Tag | str,
               pipeline: diffusers.DiffusionPipeline,
               *, signatures: dict | None = None,
               labels: dict | None = None,
               metadata: dict | None = None,
               ) -> bentoml.Model: ...

def get_runnable(bento_model: bentoml.Model) -> type[bentoml.legacy.Runnable]: ...

Import

import bentoml

# Via public API
model = bentoml.diffusers.save_model(...)
pipeline = bentoml.diffusers.load_model(...)
model = bentoml.diffusers.import_model(...)

I/O Contract

Inputs

save_model()

Name	Type	Required	Description
name	Tag or str	Yes	Name/tag for the model in the BentoML store
pipeline	DiffusionPipeline	Yes	The diffusers pipeline instance to save
signatures	dict or None	No	Inference method signatures (default: {"__call__": {"batchable": False}})
labels	dict[str, str] or None	No	User-defined labels for model management
custom_objects	dict[str, Any] or None	No	Additional objects to serialize
metadata	dict[str, Any] or None	No	Custom metadata for the model

import_model()

Name	Type	Required	Description
name	Tag or str	Yes	Name/tag for the model in the BentoML store
model_name_or_path	str or PathLike	Yes	HuggingFace repo ID or local directory path
proxies	dict or None	No	Proxy servers for download
revision	str	No	Model version (branch, tag, or commit; default: "main")
variant	str or None	No	Model variant (e.g., "fp16", "fp32")
pipeline_class	str or type or None	No	Pipeline class to use for downloading
sync_with_hub_version	bool	No	Sync BentoML version tag with hub commit hash (default: False)

load_model()

Name	Type	Required	Description
bento_model	str, Tag, or Model	Yes	Tag or Model instance to load
device_id	str, torch.device, or None	No	Target device for the pipeline
pipeline_class	str or type	No	Pipeline class (default: DiffusionPipeline)
scheduler_class	type or None	No	Override scheduler class
torch_dtype	str, torch.dtype, or None	No	Override default dtype
enable_xformers	bool	No	Enable xformers memory-efficient attention (default: False)
lora_weights	LoraOptionType or None	No	LoRA weights to load into the pipeline
textual_inversions	TextualInversionOptionType or None	No	Textual inversions to load

Outputs

Method	Return Type	Description
save_model()	bentoml.Model	A BentoML Model referencing the saved diffusion pipeline
import_model()	bentoml.Model	A BentoML Model referencing the imported diffusion model
load_model()	DiffusionPipeline	The loaded and configured diffusers pipeline
get()	bentoml.Model	The BentoML Model reference from the store
get_runnable()	type[Runnable]	A DiffusersRunnable class with scheduler swap and LoRA support

Usage Examples

import bentoml

# Import a model from HuggingFace Hub
bentoml.diffusers.import_model(
    "my_sd15_model",
    "runwayml/stable-diffusion-v1-5",
    signatures={"__call__": {"batchable": False}},
)

# Load the pipeline
pipeline = bentoml.diffusers.load_model("my_sd15_model:latest")
result = pipeline(prompt="a photo of an astronaut riding a horse")

# Save a locally created pipeline
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
bento_model = bentoml.diffusers.save_model("my_sd_model", pipe)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment