Implementation:Bentoml BentoML Framework Diffusers
| Knowledge Sources | |
|---|---|
| Domains | ML Framework, Generative AI, Diffusion Models, Model Serialization |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
The bentoml.diffusers module provides BentoML integration for HuggingFace Diffusers pipelines, enabling save, import, load, and serving of diffusion models such as Stable Diffusion.
Description
This module implements the BentoML framework adapter for the HuggingFace diffusers library. It supports saving and loading DiffusionPipeline instances with extensive configuration options for performance optimization and customization.
ModelOptions (attrs dataclass) captures pipeline configuration including:
- Pipeline and scheduler class selection
- Torch dtype, device map, and custom pipeline paths
- Performance optimizations: xformers, attention slicing, CPU offloading, torch.compile
- LoRA weight loading and textual inversion support
- Low CPU memory usage mode and model variants
Key functions:
- save_model(): Saves a
DiffusionPipelineusingsave_pretrained()into the BentoML model store. - import_model(): Downloads a model from HuggingFace Hub or copies from a local directory, supporting version sync with hub commit hashes and variant selection.
- load_model(): Loads a pipeline from the model store with full configuration of scheduler, device placement, optimizations, LoRA weights, and textual inversions.
- get_runnable(): Creates a
DiffusersRunnablethat auto-selects GPU/CPU, applies optimizations, and supports runtime scheduler replacement and LoRA weight swapping.
Internal helpers handle LoRA argument parsing (file paths, HuggingFace repo IDs, or dictionaries), textual inversion loading, and class string-to-type resolution.
Usage
Use this module to save, import, load, and serve diffusion models (Stable Diffusion, SDXL, etc.) within BentoML services. Supports both saving locally trained pipelines and importing pre-trained models from HuggingFace Hub.
Code Reference
Source Location
- Repository: Bentoml_BentoML
- File: src/bentoml/_internal/frameworks/diffusers.py
- Lines: 1-868
Signature
def get(tag_like: str | Tag) -> bentoml.Model: ...
def load_model(bento_model: str | Tag | bentoml.Model,
device_id: str | torch.device | None = None,
pipeline_class: str | type[DiffusionPipeline] = DiffusionPipeline,
device_map: str | dict | None = None,
custom_pipeline: str | None = None,
scheduler_class: type[SchedulerMixin] | None = None,
torch_dtype: str | torch.dtype | None = None,
low_cpu_mem_usage: bool | None = None,
enable_xformers: bool = False,
enable_attention_slicing: int | str | None = None,
enable_model_cpu_offload: bool | None = None,
enable_sequential_cpu_offload: bool | None = None,
enable_torch_compile: bool | None = None,
variant: str | None = None,
lora_weights: LoraOptionType | list[LoraOptionType] | None = None,
textual_inversions: TextualInversionOptionType | list | None = None,
load_pretrained_extra_kwargs: dict[str, Any] | None = None,
) -> diffusers.DiffusionPipeline: ...
def import_model(name: Tag | str,
model_name_or_path: str | os.PathLike[str],
*, proxies: dict | None = None,
revision: str = "main",
variant: str | None = None,
pipeline_class: str | type | None = None,
sync_with_hub_version: bool = False,
signatures: dict | None = None,
labels: dict | None = None,
metadata: dict | None = None,
) -> bentoml.Model: ...
def save_model(name: Tag | str,
pipeline: diffusers.DiffusionPipeline,
*, signatures: dict | None = None,
labels: dict | None = None,
metadata: dict | None = None,
) -> bentoml.Model: ...
def get_runnable(bento_model: bentoml.Model) -> type[bentoml.legacy.Runnable]: ...
Import
import bentoml
# Via public API
model = bentoml.diffusers.save_model(...)
pipeline = bentoml.diffusers.load_model(...)
model = bentoml.diffusers.import_model(...)
I/O Contract
Inputs
save_model()
| Name | Type | Required | Description |
|---|---|---|---|
| name | Tag or str | Yes | Name/tag for the model in the BentoML store |
| pipeline | DiffusionPipeline | Yes | The diffusers pipeline instance to save |
| signatures | dict or None | No | Inference method signatures (default: {"__call__": {"batchable": False}}) |
| labels | dict[str, str] or None | No | User-defined labels for model management |
| custom_objects | dict[str, Any] or None | No | Additional objects to serialize |
| metadata | dict[str, Any] or None | No | Custom metadata for the model |
import_model()
| Name | Type | Required | Description |
|---|---|---|---|
| name | Tag or str | Yes | Name/tag for the model in the BentoML store |
| model_name_or_path | str or PathLike | Yes | HuggingFace repo ID or local directory path |
| proxies | dict or None | No | Proxy servers for download |
| revision | str | No | Model version (branch, tag, or commit; default: "main") |
| variant | str or None | No | Model variant (e.g., "fp16", "fp32") |
| pipeline_class | str or type or None | No | Pipeline class to use for downloading |
| sync_with_hub_version | bool | No | Sync BentoML version tag with hub commit hash (default: False) |
load_model()
| Name | Type | Required | Description |
|---|---|---|---|
| bento_model | str, Tag, or Model | Yes | Tag or Model instance to load |
| device_id | str, torch.device, or None | No | Target device for the pipeline |
| pipeline_class | str or type | No | Pipeline class (default: DiffusionPipeline) |
| scheduler_class | type or None | No | Override scheduler class |
| torch_dtype | str, torch.dtype, or None | No | Override default dtype |
| enable_xformers | bool | No | Enable xformers memory-efficient attention (default: False) |
| lora_weights | LoraOptionType or None | No | LoRA weights to load into the pipeline |
| textual_inversions | TextualInversionOptionType or None | No | Textual inversions to load |
Outputs
| Method | Return Type | Description |
|---|---|---|
| save_model() | bentoml.Model | A BentoML Model referencing the saved diffusion pipeline |
| import_model() | bentoml.Model | A BentoML Model referencing the imported diffusion model |
| load_model() | DiffusionPipeline | The loaded and configured diffusers pipeline |
| get() | bentoml.Model | The BentoML Model reference from the store |
| get_runnable() | type[Runnable] | A DiffusersRunnable class with scheduler swap and LoRA support |
Usage Examples
import bentoml
# Import a model from HuggingFace Hub
bentoml.diffusers.import_model(
"my_sd15_model",
"runwayml/stable-diffusion-v1-5",
signatures={"__call__": {"batchable": False}},
)
# Load the pipeline
pipeline = bentoml.diffusers.load_model("my_sd15_model:latest")
result = pipeline(prompt="a photo of an astronaut riding a horse")
# Save a locally created pipeline
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
bento_model = bentoml.diffusers.save_model("my_sd_model", pipe)