Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Bentoml BentoML HuggingFaceModel Descriptor

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-13 15:00 GMT

Overview

Concrete model descriptor class for loading models from the HuggingFace Hub into a BentoML service. The HuggingFaceModel class is an attrs-based descriptor that lazily downloads model files from the Hub on first access and returns the local snapshot path.

Description

HuggingFaceModel is a subclass of Model[str] that encapsulates all information needed to locate and download a model from the HuggingFace Hub. It stores the model ID, revision (branch or commit), an optional custom Hub endpoint, and include/exclude glob patterns for selective file downloads.

When resolve() is called, the descriptor uses the huggingface_hub library to download (or retrieve from cache) the model snapshot and returns the absolute local path to the downloaded files. The to_info() method captures the model identity and hash into a BentoModelInfo for reproducible Bento builds.

The class is decorated with @attrs.define(unsafe_hash=True), making instances hashable and usable as dictionary keys or set members, which is important for deduplication during build-time model collection.

Usage

Import and declare as a class attribute on a @bentoml.service class:

from bentoml.models import HuggingFaceModel

model_ref = HuggingFaceModel("google-bert/bert-base-uncased")
local_path = model_ref.resolve()  # Downloads if needed, returns str path

Code Reference

Source Location

  • Repository: bentoml/BentoML
  • File: src/_bentoml_sdk/models/huggingface.py (lines 28--160)

Signature

@attrs.define(unsafe_hash=True)
class HuggingFaceModel(Model[str]):
    model_id: str
    revision: str = "main"
    endpoint: Optional[str] = None
    include: Optional[List[str]] = None
    exclude: Optional[List[str]] = None

Key Methods

def resolve(self) -> str:
    """Download the model snapshot (if not cached) and return
    the absolute local path to the model directory."""
    ...

def to_info(self) -> BentoModelInfo:
    """Return a BentoModelInfo capturing model identity and hash
    for inclusion in a Bento manifest."""
    ...

Import

from bentoml.models import HuggingFaceModel

I/O Contract

Inputs

Input Contract
Name Type Description
model_id str HuggingFace model identifier (e.g., "google-bert/bert-base-uncased", "meta-llama/Llama-2-7b-hf").
revision str Git revision (branch name, tag, or commit SHA). Defaults to "main".
endpoint Optional[str] Custom HuggingFace Hub endpoint URL. Defaults to None (uses the public Hub).
include Optional[List[str]] Glob patterns for files to include in the download (e.g., ["*.safetensors", "config.json"]).
exclude Optional[List[str]] Glob patterns for files to exclude from the download.

Outputs

Output Contract
Name Type Description
resolve() return value str Absolute local filesystem path to the downloaded model snapshot directory.
to_info() return value BentoModelInfo Metadata object capturing the model identity, revision, and content hash for Bento manifests.

Usage Examples

Example 1: Basic HuggingFace Model Reference

Load a BERT model for text classification.

import bentoml
from bentoml.models import HuggingFaceModel

@bentoml.service
class BertService:
    model_ref = HuggingFaceModel("google-bert/bert-base-uncased")

    def __init__(self):
        from transformers import AutoModel, AutoTokenizer
        path = self.model_ref.resolve()
        self.tokenizer = AutoTokenizer.from_pretrained(path)
        self.model = AutoModel.from_pretrained(path)

    @bentoml.api
    def embed(self, text: str) -> list:
        inputs = self.tokenizer(text, return_tensors="pt")
        outputs = self.model(**inputs)
        return outputs.last_hidden_state[0][0].tolist()
  • model_ref is declared as a class attribute -- BentoML discovers it during bentoml build.
  • resolve() is called inside __init__, triggering the download on first worker startup.

Example 2: Selective File Download

Download only safetensors weights and config files to reduce download size.

from bentoml.models import HuggingFaceModel

model_ref = HuggingFaceModel(
    "meta-llama/Llama-2-7b-hf",
    revision="main",
    include=["*.safetensors", "config.json", "tokenizer.*"],
    exclude=["*.bin"],
)
  • include limits downloads to safetensors, config, and tokenizer files.
  • exclude ensures legacy .bin weight files are skipped.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment