Implementation:Bentoml BentoML HuggingFaceModel Descriptor
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
Concrete model descriptor class for loading models from the HuggingFace Hub into a BentoML service. The HuggingFaceModel class is an attrs-based descriptor that lazily downloads model files from the Hub on first access and returns the local snapshot path.
Description
HuggingFaceModel is a subclass of Model[str] that encapsulates all information needed to locate and download a model from the HuggingFace Hub. It stores the model ID, revision (branch or commit), an optional custom Hub endpoint, and include/exclude glob patterns for selective file downloads.
When resolve() is called, the descriptor uses the huggingface_hub library to download (or retrieve from cache) the model snapshot and returns the absolute local path to the downloaded files. The to_info() method captures the model identity and hash into a BentoModelInfo for reproducible Bento builds.
The class is decorated with @attrs.define(unsafe_hash=True), making instances hashable and usable as dictionary keys or set members, which is important for deduplication during build-time model collection.
Usage
Import and declare as a class attribute on a @bentoml.service class:
from bentoml.models import HuggingFaceModel
model_ref = HuggingFaceModel("google-bert/bert-base-uncased")
local_path = model_ref.resolve() # Downloads if needed, returns str path
Code Reference
Source Location
- Repository:
bentoml/BentoML - File:
src/_bentoml_sdk/models/huggingface.py(lines 28--160)
Signature
@attrs.define(unsafe_hash=True)
class HuggingFaceModel(Model[str]):
model_id: str
revision: str = "main"
endpoint: Optional[str] = None
include: Optional[List[str]] = None
exclude: Optional[List[str]] = None
Key Methods
def resolve(self) -> str:
"""Download the model snapshot (if not cached) and return
the absolute local path to the model directory."""
...
def to_info(self) -> BentoModelInfo:
"""Return a BentoModelInfo capturing model identity and hash
for inclusion in a Bento manifest."""
...
Import
from bentoml.models import HuggingFaceModel
I/O Contract
Inputs
| Name | Type | Description |
|---|---|---|
model_id |
str | HuggingFace model identifier (e.g., "google-bert/bert-base-uncased", "meta-llama/Llama-2-7b-hf").
|
revision |
str | Git revision (branch name, tag, or commit SHA). Defaults to "main".
|
endpoint |
Optional[str] | Custom HuggingFace Hub endpoint URL. Defaults to None (uses the public Hub).
|
include |
Optional[List[str]] | Glob patterns for files to include in the download (e.g., ["*.safetensors", "config.json"]).
|
exclude |
Optional[List[str]] | Glob patterns for files to exclude from the download. |
Outputs
| Name | Type | Description |
|---|---|---|
resolve() return value |
str | Absolute local filesystem path to the downloaded model snapshot directory. |
to_info() return value |
BentoModelInfo | Metadata object capturing the model identity, revision, and content hash for Bento manifests. |
Usage Examples
Example 1: Basic HuggingFace Model Reference
Load a BERT model for text classification.
import bentoml
from bentoml.models import HuggingFaceModel
@bentoml.service
class BertService:
model_ref = HuggingFaceModel("google-bert/bert-base-uncased")
def __init__(self):
from transformers import AutoModel, AutoTokenizer
path = self.model_ref.resolve()
self.tokenizer = AutoTokenizer.from_pretrained(path)
self.model = AutoModel.from_pretrained(path)
@bentoml.api
def embed(self, text: str) -> list:
inputs = self.tokenizer(text, return_tensors="pt")
outputs = self.model(**inputs)
return outputs.last_hidden_state[0][0].tolist()
model_refis declared as a class attribute -- BentoML discovers it duringbentoml build.resolve()is called inside__init__, triggering the download on first worker startup.
Example 2: Selective File Download
Download only safetensors weights and config files to reduce download size.
from bentoml.models import HuggingFaceModel
model_ref = HuggingFaceModel(
"meta-llama/Llama-2-7b-hf",
revision="main",
include=["*.safetensors", "config.json", "tokenizer.*"],
exclude=["*.bin"],
)
includelimits downloads to safetensors, config, and tokenizer files.excludeensures legacy.binweight files are skipped.