Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Bentoml BentoML Framework Diffusers

From Leeroopedia
Knowledge Sources
Domains ML Framework, Generative AI, Diffusion Models, Model Serialization
Last Updated 2026-02-13 15:00 GMT

Overview

The bentoml.diffusers module provides BentoML integration for HuggingFace Diffusers pipelines, enabling save, import, load, and serving of diffusion models such as Stable Diffusion.

Description

This module implements the BentoML framework adapter for the HuggingFace diffusers library. It supports saving and loading DiffusionPipeline instances with extensive configuration options for performance optimization and customization.

ModelOptions (attrs dataclass) captures pipeline configuration including:

  • Pipeline and scheduler class selection
  • Torch dtype, device map, and custom pipeline paths
  • Performance optimizations: xformers, attention slicing, CPU offloading, torch.compile
  • LoRA weight loading and textual inversion support
  • Low CPU memory usage mode and model variants

Key functions:

  • save_model(): Saves a DiffusionPipeline using save_pretrained() into the BentoML model store.
  • import_model(): Downloads a model from HuggingFace Hub or copies from a local directory, supporting version sync with hub commit hashes and variant selection.
  • load_model(): Loads a pipeline from the model store with full configuration of scheduler, device placement, optimizations, LoRA weights, and textual inversions.
  • get_runnable(): Creates a DiffusersRunnable that auto-selects GPU/CPU, applies optimizations, and supports runtime scheduler replacement and LoRA weight swapping.

Internal helpers handle LoRA argument parsing (file paths, HuggingFace repo IDs, or dictionaries), textual inversion loading, and class string-to-type resolution.

Usage

Use this module to save, import, load, and serve diffusion models (Stable Diffusion, SDXL, etc.) within BentoML services. Supports both saving locally trained pipelines and importing pre-trained models from HuggingFace Hub.

Code Reference

Source Location

Signature

def get(tag_like: str | Tag) -> bentoml.Model: ...

def load_model(bento_model: str | Tag | bentoml.Model,
               device_id: str | torch.device | None = None,
               pipeline_class: str | type[DiffusionPipeline] = DiffusionPipeline,
               device_map: str | dict | None = None,
               custom_pipeline: str | None = None,
               scheduler_class: type[SchedulerMixin] | None = None,
               torch_dtype: str | torch.dtype | None = None,
               low_cpu_mem_usage: bool | None = None,
               enable_xformers: bool = False,
               enable_attention_slicing: int | str | None = None,
               enable_model_cpu_offload: bool | None = None,
               enable_sequential_cpu_offload: bool | None = None,
               enable_torch_compile: bool | None = None,
               variant: str | None = None,
               lora_weights: LoraOptionType | list[LoraOptionType] | None = None,
               textual_inversions: TextualInversionOptionType | list | None = None,
               load_pretrained_extra_kwargs: dict[str, Any] | None = None,
               ) -> diffusers.DiffusionPipeline: ...

def import_model(name: Tag | str,
                 model_name_or_path: str | os.PathLike[str],
                 *, proxies: dict | None = None,
                 revision: str = "main",
                 variant: str | None = None,
                 pipeline_class: str | type | None = None,
                 sync_with_hub_version: bool = False,
                 signatures: dict | None = None,
                 labels: dict | None = None,
                 metadata: dict | None = None,
                 ) -> bentoml.Model: ...

def save_model(name: Tag | str,
               pipeline: diffusers.DiffusionPipeline,
               *, signatures: dict | None = None,
               labels: dict | None = None,
               metadata: dict | None = None,
               ) -> bentoml.Model: ...

def get_runnable(bento_model: bentoml.Model) -> type[bentoml.legacy.Runnable]: ...

Import

import bentoml

# Via public API
model = bentoml.diffusers.save_model(...)
pipeline = bentoml.diffusers.load_model(...)
model = bentoml.diffusers.import_model(...)

I/O Contract

Inputs

save_model()

Name Type Required Description
name Tag or str Yes Name/tag for the model in the BentoML store
pipeline DiffusionPipeline Yes The diffusers pipeline instance to save
signatures dict or None No Inference method signatures (default: {"__call__": {"batchable": False}})
labels dict[str, str] or None No User-defined labels for model management
custom_objects dict[str, Any] or None No Additional objects to serialize
metadata dict[str, Any] or None No Custom metadata for the model

import_model()

Name Type Required Description
name Tag or str Yes Name/tag for the model in the BentoML store
model_name_or_path str or PathLike Yes HuggingFace repo ID or local directory path
proxies dict or None No Proxy servers for download
revision str No Model version (branch, tag, or commit; default: "main")
variant str or None No Model variant (e.g., "fp16", "fp32")
pipeline_class str or type or None No Pipeline class to use for downloading
sync_with_hub_version bool No Sync BentoML version tag with hub commit hash (default: False)

load_model()

Name Type Required Description
bento_model str, Tag, or Model Yes Tag or Model instance to load
device_id str, torch.device, or None No Target device for the pipeline
pipeline_class str or type No Pipeline class (default: DiffusionPipeline)
scheduler_class type or None No Override scheduler class
torch_dtype str, torch.dtype, or None No Override default dtype
enable_xformers bool No Enable xformers memory-efficient attention (default: False)
lora_weights LoraOptionType or None No LoRA weights to load into the pipeline
textual_inversions TextualInversionOptionType or None No Textual inversions to load

Outputs

Method Return Type Description
save_model() bentoml.Model A BentoML Model referencing the saved diffusion pipeline
import_model() bentoml.Model A BentoML Model referencing the imported diffusion model
load_model() DiffusionPipeline The loaded and configured diffusers pipeline
get() bentoml.Model The BentoML Model reference from the store
get_runnable() type[Runnable] A DiffusersRunnable class with scheduler swap and LoRA support

Usage Examples

import bentoml

# Import a model from HuggingFace Hub
bentoml.diffusers.import_model(
    "my_sd15_model",
    "runwayml/stable-diffusion-v1-5",
    signatures={"__call__": {"batchable": False}},
)

# Load the pipeline
pipeline = bentoml.diffusers.load_model("my_sd15_model:latest")
result = pipeline(prompt="a photo of an astronaut riding a horse")

# Save a locally created pipeline
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
bento_model = bentoml.diffusers.save_model("my_sd_model", pipe)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment