Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Bentoml BentoML Model Store Management

From Leeroopedia
Knowledge Sources
Domains ML_Serving, Model_Management, ML_Ops
Last Updated 2026-02-13 15:00 GMT

Overview

End-to-end process for saving, loading, versioning, and managing ML model artifacts using BentoML's Model Store and cloud registry.

Description

This workflow covers the complete model lifecycle management within BentoML, from saving trained models into the local Model Store to loading them in services, and sharing them across teams via export/import or BentoCloud. The Model Store provides a local filesystem-backed repository with tag-based versioning (name:version), metadata tracking, and framework-agnostic storage. Models stored here are automatically referenced when building Bentos and deployed alongside services.

Key capabilities covered:

  • Saving models to the local Model Store with bentoml.models.create()
  • Loading models from HuggingFace Hub via HuggingFaceModel
  • Loading models from the Model Store via BentoModel
  • Tag-based model versioning and management
  • Model export/import for portability
  • Push/pull to BentoCloud for team collaboration

Usage

Execute this workflow when you need to manage model artifacts for BentoML services. This includes saving fine-tuned or custom-trained models, loading pre-trained models from HuggingFace, managing model versions, or sharing models across development and production environments.

Execution Steps

Step 1: Save a Model to the Store

Use bentoml.models.create() as a context manager to register a model in the local Model Store. Within the context, save the model files to the provided path. The Model Store assigns a unique version tag and stores the model in a structured directory. Models can be saved from any framework (PyTorch, TensorFlow, scikit-learn, etc.) by writing their serialized files to the provided path.

Key considerations:

  • The context manager ensures proper cleanup if saving fails
  • Model metadata (labels, custom metadata) can be attached during creation
  • The default store location is ~/bentoml/models/
  • Each model version is immutable once saved
  • Use model_ref.path to get the directory for saving model files

Step 2: Load a Model in a Service

Declare model references at the class level using BentoModel (for Model Store models) or HuggingFaceModel (for HuggingFace Hub models). Class-level declaration is critical because it registers the model as a service dependency, ensuring it is included when building a Bento. In the constructor, use the model path to load the actual model into memory.

Key considerations:

  • BentoModel("name:version") loads from the local store or BentoCloud
  • HuggingFaceModel("org/model-id") downloads from HuggingFace Hub
  • Models MUST be declared as class attributes, not inside __init__
  • BentoModel returns a Model object with a path_of() method
  • HuggingFaceModel returns the downloaded model path as a string
  • On BentoCloud, models are pre-downloaded during image build for fast cold starts

Step 3: Version and Organize Models

Use tag-based versioning to maintain a clear record of model iterations. Each saved model receives a name:version tag where the version is auto-generated. List, inspect, and manage models using CLI commands or Python APIs. Attach labels and metadata to models for organization.

Key considerations:

  • bentoml models list shows all stored models with tags, sizes, and dates
  • bentoml models get <tag> retrieves detailed model information
  • The :latest alias always points to the most recently saved version
  • Labels enable filtering and categorization (e.g., by project or stage)
  • Model metadata stores arbitrary key-value pairs (hyperparameters, metrics)

Step 4: Export and Import Models

Export models as standalone archive files (.bentomodel) for sharing between machines or build stages. Import previously exported models into the local Model Store. Both operations support local filesystem paths and remote storage (S3, GCS, FTP) for team-scale model sharing.

Key considerations:

  • bentoml models export <tag> <path> creates a portable archive
  • bentoml models import <path> loads an archive into the local store
  • Remote storage URLs are supported (s3://, gs://, ftp://)
  • The fs-s3fs package is required for S3 support
  • Python APIs (bentoml.models.export_model, import_model) provide programmatic access

Step 5: Sync with BentoCloud

Push models to BentoCloud for centralized storage and team collaboration. Pull models from BentoCloud to local development environments. BentoCloud provides a web console for browsing and managing all shared models with access control.

Key considerations:

  • bentoml models push <tag> uploads to BentoCloud registry
  • bentoml models pull <tag> downloads from BentoCloud
  • Requires BentoCloud authentication (bentoml cloud login)
  • Models on BentoCloud are accessible to all team members with appropriate permissions
  • BentoCloud accelerates deployment by caching models close to compute

Step 6: Clean Up Models

Remove models that are no longer needed from the local store to free disk space. Use the CLI or Python API to delete specific model versions or all versions of a model.

Key considerations:

  • bentoml models delete <tag> removes a specific version
  • Use -y flag to skip confirmation prompt
  • Deletion from the local store does not affect BentoCloud copies
  • Ensure no active Bentos reference a model before deleting it

Execution Diagram

GitHub URL

Workflow Repository