Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Bentoml BentoML Framework Transformers

From Leeroopedia
Knowledge Sources
Domains ML Framework, NLP, Generative AI, Model Serialization
Last Updated 2026-02-13 15:00 GMT

Overview

The bentoml.transformers module provides BentoML integration for HuggingFace Transformers, enabling save, import, load, and serving of Transformers pipelines, pre-trained models, tokenizers, and custom pipelines.

Description

This is the most comprehensive BentoML framework adapter, supporting the full breadth of HuggingFace Transformers objects. It handles both transformers.Pipeline instances and individual pre-trained objects (models, tokenizers, feature extractors, image processors).

ModelOptions (attrs dataclass) captures:

  • task: Pipeline task name (e.g., "text-generation", "sentiment-analysis")
  • tf, pt: Tuples of auto model class names for TensorFlow and PyTorch
  • default: Default model mappings for the task
  • type: Pipeline type (text, audio, image, multimodal)
  • kwargs: Extra keyword arguments passed to the pipeline

Key functions:

  • save_model(): Saves a Pipeline or PreTrained object. For pipelines, saves the pipeline class via cloudpickle and model weights via save_pretrained(). For pretrained objects, saves the class and weights separately. Supports custom pipeline registration.
  • import_model(): Downloads a model from HuggingFace Hub using from_pretrained() or snapshot_download(). Supports clone_repository mode and version syncing with hub commit hashes. Uses init_empty_weights() to avoid loading model weights into memory during import.
  • load_model(): Loads models with backward compatibility (v1 and v2 API versions). Handles custom pipeline registration, pretrained protocol loading, and automatic task detection. Supports trust_remote_code.
  • get_runnable(): Creates a TransformersRunnable with framework-aware GPU placement (PyTorch, TensorFlow, or Flax/JAX).
  • make_default_signatures(): Automatically infers appropriate method signatures based on the pretrained class type (tokenizer, model, image processor, etc.).
  • register_pipeline() / delete_pipeline(): Manage custom pipeline registrations in the Transformers pipeline registry.

The module supports both the newer PIPELINE_REGISTRY (transformers >= 4.21) and legacy SUPPORTED_TASKS approaches.

Usage

Use this module to save, import, load, and serve any HuggingFace Transformers model or pipeline within BentoML services. Covers NLP, vision, audio, and multimodal tasks.

Code Reference

Source Location

Signature

def get(tag_like: str | Tag) -> Model: ...

def load_model(bento_model: str | Tag | Model,
               *args: Any, **kwargs: Any) -> transformers.Pipeline | TransformersPreTrained: ...

def import_model(name: Tag | str,
                 model_name_or_path: str | os.PathLike[str],
                 *, proxies: dict | None = None,
                 revision: str = "main",
                 force_download: bool = False,
                 trust_remote_code: bool = False,
                 clone_repository: bool = False,
                 sync_with_hub_version: bool = False,
                 signatures: ModelSignaturesType | None = None,
                 labels: dict | None = None,
                 metadata: dict | None = None,
                 **extra_hf_hub_kwargs: dict) -> bentoml.Model: ...

def save_model(name: Tag | str,
               pretrained_or_pipeline: TransformersPreTrained | Pipeline | None = None,
               pipeline: Pipeline | None = None,
               task_name: str | None = None,
               task_definition: dict | TaskDefinition | None = None,
               *, signatures: ModelSignaturesType | None = None,
               labels: dict | None = None,
               metadata: dict | None = None,
               **save_kwargs: Any) -> bentoml.Model: ...

def get_runnable(bento_model: bentoml.Model) -> type[bentoml.legacy.Runnable]: ...

def register_pipeline(task: str, impl: type[Pipeline], ...) -> None: ...
def delete_pipeline(task: str) -> None: ...
def make_default_signatures(pretrained_cls: Any) -> ModelSignaturesType: ...

Import

import bentoml

# Via public API
model = bentoml.transformers.save_model(...)
pipeline = bentoml.transformers.load_model(...)
model = bentoml.transformers.import_model(...)

I/O Contract

Inputs

save_model()

Name Type Required Description
name Tag or str Yes Name/tag for the model in the BentoML store
pretrained_or_pipeline Pipeline, PreTrained, or PreTrainedProtocol Yes The Transformers object to save
task_name str or None No Pipeline task name (required for custom pipelines)
task_definition dict or TaskDefinition or None No Task definition for custom pipelines (requires impl, pt/tf, default, type keys)
signatures ModelSignaturesType or None No Inference method signatures (auto-inferred from class type)
labels dict[str, str] or None No User-defined labels
custom_objects dict[str, Any] or None No Additional objects to serialize
external_modules List[ModuleType] or None No Additional Python modules for custom pipelines
metadata dict[str, Any] or None No Custom metadata

import_model()

Name Type Required Description
name Tag or str Yes Name/tag for the model in the BentoML store
model_name_or_path str or PathLike Yes HuggingFace repo ID or local directory path
proxies dict or None No Proxy servers for download
revision str No Model version (default: "main")
force_download bool No Force re-download (default: False)
trust_remote_code bool No Allow custom code from Hub (default: False)
clone_repository bool No Download all files via snapshot_download (default: False)
sync_with_hub_version bool No Sync version tag with hub commit hash (default: False)

load_model()

Name Type Required Description
bento_model str, Tag, or Model Yes Tag or Model instance to load
args Any No Additional args for PreTrained.from_pretrained()
kwargs Any No Additional kwargs for pipeline or from_pretrained()

Outputs

Method Return Type Description
save_model() bentoml.Model A BentoML Model referencing the saved Transformers object
import_model() bentoml.Model A BentoML Model referencing the imported Transformers model
load_model() Pipeline or TransformersPreTrained The loaded pipeline or pre-trained object
get() Model The BentoML Model reference from the store
get_runnable() type[Runnable] A TransformersRunnable class with framework-aware GPU support

Usage Examples

import bentoml
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

# Save a pipeline
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
model = AutoModelForCausalLM.from_pretrained("distilgpt2")
generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
bento_model = bentoml.transformers.save_model(
    "text-generation-pipeline", generator
)

# Import a model from HuggingFace Hub
bentoml.transformers.import_model(
    "my_t5_model", "t5-base",
    signatures={"__call__": {"batchable": False}},
)

# Load the pipeline
loaded_pipeline = bentoml.transformers.load_model("text-generation-pipeline:latest")
result = loaded_pipeline("Hello, world!")

# Save a pre-trained model directly
model = AutoModelForCausalLM.from_pretrained("distilgpt2")
bento_model = bentoml.transformers.save_model("distilgpt2-model", model)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment