Workflow:Bentoml BentoML Model Store Management

Knowledge Sources	BentoML BentoML Docs Model Management Guide
Domains	ML_Serving, Model_Management, ML_Ops
Last Updated	2026-02-13 15:00 GMT

Overview

End-to-end process for saving, loading, versioning, and managing ML model artifacts using BentoML's Model Store and cloud registry.

Description

This workflow covers the complete model lifecycle management within BentoML, from saving trained models into the local Model Store to loading them in services, and sharing them across teams via export/import or BentoCloud. The Model Store provides a local filesystem-backed repository with tag-based versioning (name:version), metadata tracking, and framework-agnostic storage. Models stored here are automatically referenced when building Bentos and deployed alongside services.

Key capabilities covered:

Saving models to the local Model Store with bentoml.models.create()
Loading models from HuggingFace Hub via HuggingFaceModel
Loading models from the Model Store via BentoModel
Tag-based model versioning and management
Model export/import for portability
Push/pull to BentoCloud for team collaboration

Usage

Execute this workflow when you need to manage model artifacts for BentoML services. This includes saving fine-tuned or custom-trained models, loading pre-trained models from HuggingFace, managing model versions, or sharing models across development and production environments.

Execution Steps

Step 1: Save a Model to the Store

Use bentoml.models.create() as a context manager to register a model in the local Model Store. Within the context, save the model files to the provided path. The Model Store assigns a unique version tag and stores the model in a structured directory. Models can be saved from any framework (PyTorch, TensorFlow, scikit-learn, etc.) by writing their serialized files to the provided path.

Key considerations:

The context manager ensures proper cleanup if saving fails
Model metadata (labels, custom metadata) can be attached during creation
The default store location is ~/bentoml/models/
Each model version is immutable once saved
Use model_ref.path to get the directory for saving model files

Step 2: Load a Model in a Service

Declare model references at the class level using BentoModel (for Model Store models) or HuggingFaceModel (for HuggingFace Hub models). Class-level declaration is critical because it registers the model as a service dependency, ensuring it is included when building a Bento. In the constructor, use the model path to load the actual model into memory.

Key considerations:

BentoModel("name:version") loads from the local store or BentoCloud
HuggingFaceModel("org/model-id") downloads from HuggingFace Hub
Models MUST be declared as class attributes, not inside __init__
BentoModel returns a Model object with a path_of() method
HuggingFaceModel returns the downloaded model path as a string
On BentoCloud, models are pre-downloaded during image build for fast cold starts

Step 3: Version and Organize Models

Use tag-based versioning to maintain a clear record of model iterations. Each saved model receives a name:version tag where the version is auto-generated. List, inspect, and manage models using CLI commands or Python APIs. Attach labels and metadata to models for organization.

Key considerations:

bentoml models list shows all stored models with tags, sizes, and dates
bentoml models get <tag> retrieves detailed model information
The :latest alias always points to the most recently saved version
Labels enable filtering and categorization (e.g., by project or stage)
Model metadata stores arbitrary key-value pairs (hyperparameters, metrics)

Step 4: Export and Import Models

Export models as standalone archive files (.bentomodel) for sharing between machines or build stages. Import previously exported models into the local Model Store. Both operations support local filesystem paths and remote storage (S3, GCS, FTP) for team-scale model sharing.

Key considerations:

bentoml models export <tag> <path> creates a portable archive
bentoml models import <path> loads an archive into the local store
Remote storage URLs are supported (s3://, gs://, ftp://)
The fs-s3fs package is required for S3 support
Python APIs (bentoml.models.export_model, import_model) provide programmatic access

Step 5: Sync with BentoCloud

Push models to BentoCloud for centralized storage and team collaboration. Pull models from BentoCloud to local development environments. BentoCloud provides a web console for browsing and managing all shared models with access control.

Key considerations:

bentoml models push <tag> uploads to BentoCloud registry
bentoml models pull <tag> downloads from BentoCloud
Requires BentoCloud authentication (bentoml cloud login)
Models on BentoCloud are accessible to all team members with appropriate permissions
BentoCloud accelerates deployment by caching models close to compute

Step 6: Clean Up Models

Remove models that are no longer needed from the local store to free disk space. Use the CLI or Python API to delete specific model versions or all versions of a model.

Key considerations:

bentoml models delete <tag> removes a specific version
Use -y flag to skip confirmation prompt
Deletion from the local store does not affect BentoCloud copies
Ensure no active Bentos reference a model before deleting it

Execution Diagram

GitHub URL

Workflow Repository