Implementation:Bentoml BentoML Framework Transformers
| Knowledge Sources | |
|---|---|
| Domains | ML Framework, NLP, Generative AI, Model Serialization |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
The bentoml.transformers module provides BentoML integration for HuggingFace Transformers, enabling save, import, load, and serving of Transformers pipelines, pre-trained models, tokenizers, and custom pipelines.
Description
This is the most comprehensive BentoML framework adapter, supporting the full breadth of HuggingFace Transformers objects. It handles both transformers.Pipeline instances and individual pre-trained objects (models, tokenizers, feature extractors, image processors).
ModelOptions (attrs dataclass) captures:
task: Pipeline task name (e.g., "text-generation", "sentiment-analysis")tf,pt: Tuples of auto model class names for TensorFlow and PyTorchdefault: Default model mappings for the tasktype: Pipeline type (text, audio, image, multimodal)kwargs: Extra keyword arguments passed to the pipeline
Key functions:
- save_model(): Saves a Pipeline or PreTrained object. For pipelines, saves the pipeline class via cloudpickle and model weights via
save_pretrained(). For pretrained objects, saves the class and weights separately. Supports custom pipeline registration. - import_model(): Downloads a model from HuggingFace Hub using
from_pretrained()orsnapshot_download(). Supportsclone_repositorymode and version syncing with hub commit hashes. Usesinit_empty_weights()to avoid loading model weights into memory during import. - load_model(): Loads models with backward compatibility (v1 and v2 API versions). Handles custom pipeline registration, pretrained protocol loading, and automatic task detection. Supports trust_remote_code.
- get_runnable(): Creates a
TransformersRunnablewith framework-aware GPU placement (PyTorch, TensorFlow, or Flax/JAX). - make_default_signatures(): Automatically infers appropriate method signatures based on the pretrained class type (tokenizer, model, image processor, etc.).
- register_pipeline() / delete_pipeline(): Manage custom pipeline registrations in the Transformers pipeline registry.
The module supports both the newer PIPELINE_REGISTRY (transformers >= 4.21) and legacy SUPPORTED_TASKS approaches.
Usage
Use this module to save, import, load, and serve any HuggingFace Transformers model or pipeline within BentoML services. Covers NLP, vision, audio, and multimodal tasks.
Code Reference
Source Location
- Repository: Bentoml_BentoML
- File: src/bentoml/_internal/frameworks/transformers.py
- Lines: 1-1260
Signature
def get(tag_like: str | Tag) -> Model: ...
def load_model(bento_model: str | Tag | Model,
*args: Any, **kwargs: Any) -> transformers.Pipeline | TransformersPreTrained: ...
def import_model(name: Tag | str,
model_name_or_path: str | os.PathLike[str],
*, proxies: dict | None = None,
revision: str = "main",
force_download: bool = False,
trust_remote_code: bool = False,
clone_repository: bool = False,
sync_with_hub_version: bool = False,
signatures: ModelSignaturesType | None = None,
labels: dict | None = None,
metadata: dict | None = None,
**extra_hf_hub_kwargs: dict) -> bentoml.Model: ...
def save_model(name: Tag | str,
pretrained_or_pipeline: TransformersPreTrained | Pipeline | None = None,
pipeline: Pipeline | None = None,
task_name: str | None = None,
task_definition: dict | TaskDefinition | None = None,
*, signatures: ModelSignaturesType | None = None,
labels: dict | None = None,
metadata: dict | None = None,
**save_kwargs: Any) -> bentoml.Model: ...
def get_runnable(bento_model: bentoml.Model) -> type[bentoml.legacy.Runnable]: ...
def register_pipeline(task: str, impl: type[Pipeline], ...) -> None: ...
def delete_pipeline(task: str) -> None: ...
def make_default_signatures(pretrained_cls: Any) -> ModelSignaturesType: ...
Import
import bentoml
# Via public API
model = bentoml.transformers.save_model(...)
pipeline = bentoml.transformers.load_model(...)
model = bentoml.transformers.import_model(...)
I/O Contract
Inputs
save_model()
| Name | Type | Required | Description |
|---|---|---|---|
| name | Tag or str | Yes | Name/tag for the model in the BentoML store |
| pretrained_or_pipeline | Pipeline, PreTrained, or PreTrainedProtocol | Yes | The Transformers object to save |
| task_name | str or None | No | Pipeline task name (required for custom pipelines) |
| task_definition | dict or TaskDefinition or None | No | Task definition for custom pipelines (requires impl, pt/tf, default, type keys) |
| signatures | ModelSignaturesType or None | No | Inference method signatures (auto-inferred from class type) |
| labels | dict[str, str] or None | No | User-defined labels |
| custom_objects | dict[str, Any] or None | No | Additional objects to serialize |
| external_modules | List[ModuleType] or None | No | Additional Python modules for custom pipelines |
| metadata | dict[str, Any] or None | No | Custom metadata |
import_model()
| Name | Type | Required | Description |
|---|---|---|---|
| name | Tag or str | Yes | Name/tag for the model in the BentoML store |
| model_name_or_path | str or PathLike | Yes | HuggingFace repo ID or local directory path |
| proxies | dict or None | No | Proxy servers for download |
| revision | str | No | Model version (default: "main") |
| force_download | bool | No | Force re-download (default: False) |
| trust_remote_code | bool | No | Allow custom code from Hub (default: False) |
| clone_repository | bool | No | Download all files via snapshot_download (default: False) |
| sync_with_hub_version | bool | No | Sync version tag with hub commit hash (default: False) |
load_model()
| Name | Type | Required | Description |
|---|---|---|---|
| bento_model | str, Tag, or Model | Yes | Tag or Model instance to load |
| args | Any | No | Additional args for PreTrained.from_pretrained() |
| kwargs | Any | No | Additional kwargs for pipeline or from_pretrained() |
Outputs
| Method | Return Type | Description |
|---|---|---|
| save_model() | bentoml.Model | A BentoML Model referencing the saved Transformers object |
| import_model() | bentoml.Model | A BentoML Model referencing the imported Transformers model |
| load_model() | Pipeline or TransformersPreTrained | The loaded pipeline or pre-trained object |
| get() | Model | The BentoML Model reference from the store |
| get_runnable() | type[Runnable] | A TransformersRunnable class with framework-aware GPU support |
Usage Examples
import bentoml
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
# Save a pipeline
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
model = AutoModelForCausalLM.from_pretrained("distilgpt2")
generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
bento_model = bentoml.transformers.save_model(
"text-generation-pipeline", generator
)
# Import a model from HuggingFace Hub
bentoml.transformers.import_model(
"my_t5_model", "t5-base",
signatures={"__call__": {"batchable": False}},
)
# Load the pipeline
loaded_pipeline = bentoml.transformers.load_model("text-generation-pipeline:latest")
result = loaded_pipeline("Hello, world!")
# Save a pre-trained model directly
model = AutoModelForCausalLM.from_pretrained("distilgpt2")
bento_model = bentoml.transformers.save_model("distilgpt2-model", model)