Workflow:Bentoml BentoML Model Store Management
| Knowledge Sources | |
|---|---|
| Domains | ML_Serving, Model_Management, ML_Ops |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
End-to-end process for saving, loading, versioning, and managing ML model artifacts using BentoML's Model Store and cloud registry.
Description
This workflow covers the complete model lifecycle management within BentoML, from saving trained models into the local Model Store to loading them in services, and sharing them across teams via export/import or BentoCloud. The Model Store provides a local filesystem-backed repository with tag-based versioning (name:version), metadata tracking, and framework-agnostic storage. Models stored here are automatically referenced when building Bentos and deployed alongside services.
Key capabilities covered:
- Saving models to the local Model Store with bentoml.models.create()
- Loading models from HuggingFace Hub via HuggingFaceModel
- Loading models from the Model Store via BentoModel
- Tag-based model versioning and management
- Model export/import for portability
- Push/pull to BentoCloud for team collaboration
Usage
Execute this workflow when you need to manage model artifacts for BentoML services. This includes saving fine-tuned or custom-trained models, loading pre-trained models from HuggingFace, managing model versions, or sharing models across development and production environments.
Execution Steps
Step 1: Save a Model to the Store
Use bentoml.models.create() as a context manager to register a model in the local Model Store. Within the context, save the model files to the provided path. The Model Store assigns a unique version tag and stores the model in a structured directory. Models can be saved from any framework (PyTorch, TensorFlow, scikit-learn, etc.) by writing their serialized files to the provided path.
Key considerations:
- The context manager ensures proper cleanup if saving fails
- Model metadata (labels, custom metadata) can be attached during creation
- The default store location is ~/bentoml/models/
- Each model version is immutable once saved
- Use model_ref.path to get the directory for saving model files
Step 2: Load a Model in a Service
Declare model references at the class level using BentoModel (for Model Store models) or HuggingFaceModel (for HuggingFace Hub models). Class-level declaration is critical because it registers the model as a service dependency, ensuring it is included when building a Bento. In the constructor, use the model path to load the actual model into memory.
Key considerations:
- BentoModel("name:version") loads from the local store or BentoCloud
- HuggingFaceModel("org/model-id") downloads from HuggingFace Hub
- Models MUST be declared as class attributes, not inside __init__
- BentoModel returns a Model object with a path_of() method
- HuggingFaceModel returns the downloaded model path as a string
- On BentoCloud, models are pre-downloaded during image build for fast cold starts
Step 3: Version and Organize Models
Use tag-based versioning to maintain a clear record of model iterations. Each saved model receives a name:version tag where the version is auto-generated. List, inspect, and manage models using CLI commands or Python APIs. Attach labels and metadata to models for organization.
Key considerations:
- bentoml models list shows all stored models with tags, sizes, and dates
- bentoml models get <tag> retrieves detailed model information
- The :latest alias always points to the most recently saved version
- Labels enable filtering and categorization (e.g., by project or stage)
- Model metadata stores arbitrary key-value pairs (hyperparameters, metrics)
Step 4: Export and Import Models
Export models as standalone archive files (.bentomodel) for sharing between machines or build stages. Import previously exported models into the local Model Store. Both operations support local filesystem paths and remote storage (S3, GCS, FTP) for team-scale model sharing.
Key considerations:
- bentoml models export <tag> <path> creates a portable archive
- bentoml models import <path> loads an archive into the local store
- Remote storage URLs are supported (s3://, gs://, ftp://)
- The fs-s3fs package is required for S3 support
- Python APIs (bentoml.models.export_model, import_model) provide programmatic access
Step 5: Sync with BentoCloud
Push models to BentoCloud for centralized storage and team collaboration. Pull models from BentoCloud to local development environments. BentoCloud provides a web console for browsing and managing all shared models with access control.
Key considerations:
- bentoml models push <tag> uploads to BentoCloud registry
- bentoml models pull <tag> downloads from BentoCloud
- Requires BentoCloud authentication (bentoml cloud login)
- Models on BentoCloud are accessible to all team members with appropriate permissions
- BentoCloud accelerates deployment by caching models close to compute
Step 6: Clean Up Models
Remove models that are no longer needed from the local store to free disk space. Use the CLI or Python API to delete specific model versions or all versions of a model.
Key considerations:
- bentoml models delete <tag> removes a specific version
- Use -y flag to skip confirmation prompt
- Deletion from the local store does not affect BentoCloud copies
- Ensure no active Bentos reference a model before deleting it