Implementation:Bentoml BentoML Framework Transformers

Knowledge Sources	Bentoml_BentoML
Domains	ML Framework, NLP, Generative AI, Model Serialization
Last Updated	2026-02-13 15:00 GMT

Overview

The bentoml.transformers module provides BentoML integration for HuggingFace Transformers, enabling save, import, load, and serving of Transformers pipelines, pre-trained models, tokenizers, and custom pipelines.

Description

This is the most comprehensive BentoML framework adapter, supporting the full breadth of HuggingFace Transformers objects. It handles both transformers.Pipeline instances and individual pre-trained objects (models, tokenizers, feature extractors, image processors).

ModelOptions (attrs dataclass) captures:

task: Pipeline task name (e.g., "text-generation", "sentiment-analysis")
tf, pt: Tuples of auto model class names for TensorFlow and PyTorch
default: Default model mappings for the task
type: Pipeline type (text, audio, image, multimodal)
kwargs: Extra keyword arguments passed to the pipeline

Key functions:

save_model(): Saves a Pipeline or PreTrained object. For pipelines, saves the pipeline class via cloudpickle and model weights via save_pretrained(). For pretrained objects, saves the class and weights separately. Supports custom pipeline registration.
import_model(): Downloads a model from HuggingFace Hub using from_pretrained() or snapshot_download(). Supports clone_repository mode and version syncing with hub commit hashes. Uses init_empty_weights() to avoid loading model weights into memory during import.
load_model(): Loads models with backward compatibility (v1 and v2 API versions). Handles custom pipeline registration, pretrained protocol loading, and automatic task detection. Supports trust_remote_code.
get_runnable(): Creates a TransformersRunnable with framework-aware GPU placement (PyTorch, TensorFlow, or Flax/JAX).
make_default_signatures(): Automatically infers appropriate method signatures based on the pretrained class type (tokenizer, model, image processor, etc.).
register_pipeline() / delete_pipeline(): Manage custom pipeline registrations in the Transformers pipeline registry.

The module supports both the newer PIPELINE_REGISTRY (transformers >= 4.21) and legacy SUPPORTED_TASKS approaches.

Usage

Use this module to save, import, load, and serve any HuggingFace Transformers model or pipeline within BentoML services. Covers NLP, vision, audio, and multimodal tasks.

Code Reference

Source Location

Repository: Bentoml_BentoML
File: src/bentoml/_internal/frameworks/transformers.py
Lines: 1-1260

Signature

def get(tag_like: str | Tag) -> Model: ...

def load_model(bento_model: str | Tag | Model,
               *args: Any, **kwargs: Any) -> transformers.Pipeline | TransformersPreTrained: ...

def import_model(name: Tag | str,
                 model_name_or_path: str | os.PathLike[str],
                 *, proxies: dict | None = None,
                 revision: str = "main",
                 force_download: bool = False,
                 trust_remote_code: bool = False,
                 clone_repository: bool = False,
                 sync_with_hub_version: bool = False,
                 signatures: ModelSignaturesType | None = None,
                 labels: dict | None = None,
                 metadata: dict | None = None,
                 **extra_hf_hub_kwargs: dict) -> bentoml.Model: ...

def save_model(name: Tag | str,
               pretrained_or_pipeline: TransformersPreTrained | Pipeline | None = None,
               pipeline: Pipeline | None = None,
               task_name: str | None = None,
               task_definition: dict | TaskDefinition | None = None,
               *, signatures: ModelSignaturesType | None = None,
               labels: dict | None = None,
               metadata: dict | None = None,
               **save_kwargs: Any) -> bentoml.Model: ...

def get_runnable(bento_model: bentoml.Model) -> type[bentoml.legacy.Runnable]: ...

def register_pipeline(task: str, impl: type[Pipeline], ...) -> None: ...
def delete_pipeline(task: str) -> None: ...
def make_default_signatures(pretrained_cls: Any) -> ModelSignaturesType: ...

Import

import bentoml

# Via public API
model = bentoml.transformers.save_model(...)
pipeline = bentoml.transformers.load_model(...)
model = bentoml.transformers.import_model(...)

I/O Contract

Inputs

save_model()

Name	Type	Required	Description
name	Tag or str	Yes	Name/tag for the model in the BentoML store
pretrained_or_pipeline	Pipeline, PreTrained, or PreTrainedProtocol	Yes	The Transformers object to save
task_name	str or None	No	Pipeline task name (required for custom pipelines)
task_definition	dict or TaskDefinition or None	No	Task definition for custom pipelines (requires impl, pt/tf, default, type keys)
signatures	ModelSignaturesType or None	No	Inference method signatures (auto-inferred from class type)
labels	dict[str, str] or None	No	User-defined labels
custom_objects	dict[str, Any] or None	No	Additional objects to serialize
external_modules	List[ModuleType] or None	No	Additional Python modules for custom pipelines
metadata	dict[str, Any] or None	No	Custom metadata

import_model()

Name	Type	Required	Description
name	Tag or str	Yes	Name/tag for the model in the BentoML store
model_name_or_path	str or PathLike	Yes	HuggingFace repo ID or local directory path
proxies	dict or None	No	Proxy servers for download
revision	str	No	Model version (default: "main")
force_download	bool	No	Force re-download (default: False)
trust_remote_code	bool	No	Allow custom code from Hub (default: False)
clone_repository	bool	No	Download all files via snapshot_download (default: False)
sync_with_hub_version	bool	No	Sync version tag with hub commit hash (default: False)

load_model()

Name	Type	Required	Description
bento_model	str, Tag, or Model	Yes	Tag or Model instance to load
args	Any	No	Additional args for PreTrained.from_pretrained()
kwargs	Any	No	Additional kwargs for pipeline or from_pretrained()

Outputs

Method	Return Type	Description
save_model()	bentoml.Model	A BentoML Model referencing the saved Transformers object
import_model()	bentoml.Model	A BentoML Model referencing the imported Transformers model
load_model()	Pipeline or TransformersPreTrained	The loaded pipeline or pre-trained object
get()	Model	The BentoML Model reference from the store
get_runnable()	type[Runnable]	A TransformersRunnable class with framework-aware GPU support

Usage Examples

import bentoml
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

# Save a pipeline
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
model = AutoModelForCausalLM.from_pretrained("distilgpt2")
generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
bento_model = bentoml.transformers.save_model(
    "text-generation-pipeline", generator
)

# Import a model from HuggingFace Hub
bentoml.transformers.import_model(
    "my_t5_model", "t5-base",
    signatures={"__call__": {"batchable": False}},
)

# Load the pipeline
loaded_pipeline = bentoml.transformers.load_model("text-generation-pipeline:latest")
result = loaded_pipeline("Hello, world!")

# Save a pre-trained model directly
model = AutoModelForCausalLM.from_pretrained("distilgpt2")
bento_model = bentoml.transformers.save_model("distilgpt2-model", model)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment