Implementation:Bentoml BentoML HuggingFaceModel Descriptor

**Metadata**
Knowledge Sources	BentoML BentoML HuggingFace Integration
Domains	ML_Serving Model_Management
Last Updated	2026-02-13 15:00 GMT

Overview

Concrete model descriptor class for loading models from the HuggingFace Hub into a BentoML service. The HuggingFaceModel class is an attrs-based descriptor that lazily downloads model files from the Hub on first access and returns the local snapshot path.

Description

HuggingFaceModel is a subclass of Model[str] that encapsulates all information needed to locate and download a model from the HuggingFace Hub. It stores the model ID, revision (branch or commit), an optional custom Hub endpoint, and include/exclude glob patterns for selective file downloads.

When resolve() is called, the descriptor uses the huggingface_hub library to download (or retrieve from cache) the model snapshot and returns the absolute local path to the downloaded files. The to_info() method captures the model identity and hash into a BentoModelInfo for reproducible Bento builds.

The class is decorated with @attrs.define(unsafe_hash=True), making instances hashable and usable as dictionary keys or set members, which is important for deduplication during build-time model collection.

Usage

Import and declare as a class attribute on a @bentoml.service class:

from bentoml.models import HuggingFaceModel

model_ref = HuggingFaceModel("google-bert/bert-base-uncased")
local_path = model_ref.resolve()  # Downloads if needed, returns str path

Code Reference

Source Location

Repository: bentoml/BentoML
File: src/_bentoml_sdk/models/huggingface.py (lines 28--160)

Signature

@attrs.define(unsafe_hash=True)
class HuggingFaceModel(Model[str]):
    model_id: str
    revision: str = "main"
    endpoint: Optional[str] = None
    include: Optional[List[str]] = None
    exclude: Optional[List[str]] = None

Key Methods

def resolve(self) -> str:
    """Download the model snapshot (if not cached) and return
    the absolute local path to the model directory."""
    ...

def to_info(self) -> BentoModelInfo:
    """Return a BentoModelInfo capturing model identity and hash
    for inclusion in a Bento manifest."""
    ...

Import

from bentoml.models import HuggingFaceModel

I/O Contract

Inputs

**Input Contract**
Name	Type	Description
`model_id`	str	HuggingFace model identifier (e.g., `"google-bert/bert-base-uncased"`, `"meta-llama/Llama-2-7b-hf"`).
`revision`	str	Git revision (branch name, tag, or commit SHA). Defaults to `"main"`.
`endpoint`	Optional[str]	Custom HuggingFace Hub endpoint URL. Defaults to `None` (uses the public Hub).
`include`	Optional[List[str]]	Glob patterns for files to include in the download (e.g., `["*.safetensors", "config.json"]`).
`exclude`	Optional[List[str]]	Glob patterns for files to exclude from the download.

Outputs

**Output Contract**
Name	Type	Description
`resolve()` return value	str	Absolute local filesystem path to the downloaded model snapshot directory.
`to_info()` return value	BentoModelInfo	Metadata object capturing the model identity, revision, and content hash for Bento manifests.

Usage Examples

Example 1: Basic HuggingFace Model Reference

Load a BERT model for text classification.

import bentoml
from bentoml.models import HuggingFaceModel

@bentoml.service
class BertService:
    model_ref = HuggingFaceModel("google-bert/bert-base-uncased")

    def __init__(self):
        from transformers import AutoModel, AutoTokenizer
        path = self.model_ref.resolve()
        self.tokenizer = AutoTokenizer.from_pretrained(path)
        self.model = AutoModel.from_pretrained(path)

    @bentoml.api
    def embed(self, text: str) -> list:
        inputs = self.tokenizer(text, return_tensors="pt")
        outputs = self.model(**inputs)
        return outputs.last_hidden_state[0][0].tolist()

model_ref is declared as a class attribute -- BentoML discovers it during bentoml build.
resolve() is called inside __init__, triggering the download on first worker startup.

Example 2: Selective File Download

Download only safetensors weights and config files to reduce download size.

from bentoml.models import HuggingFaceModel

model_ref = HuggingFaceModel(
    "meta-llama/Llama-2-7b-hf",
    revision="main",
    include=["*.safetensors", "config.json", "tokenizer.*"],
    exclude=["*.bin"],
)

include limits downloads to safetensors, config, and tokenizer files.
exclude ensures legacy .bin weight files are skipped.

Related Pages

Principle:Bentoml_BentoML_Model_Loading_For_Serving

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment