Principle:Bentoml BentoML Model Persistence
| Principle Metadata | |
|---|---|
| Principle Name | Model Persistence |
| Workflow | Model_Store_Management |
| Domain | ML_Serving, Model_Management |
| Related Principle | Principle:Bentoml_BentoML_Model_Versioning |
| Implemented By | Implementation:Bentoml_BentoML_Models_Create |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
Model Persistence is the principle of saving trained ML model artifacts into BentoML's versioned local store. It provides a standardized, framework-agnostic mechanism for capturing model files, metadata, and configuration so that models can be reliably retrieved, served, and shared.
Core Concept
Saving ML model artifacts to a versioned local store ensures that every trained model is captured in a reproducible and immutable form. Regardless of the ML framework used (PyTorch, TensorFlow, scikit-learn, XGBoost, etc.), the persistence layer normalizes the storage format so that downstream operations (serving, export, deployment) work uniformly.
Theory
Model persistence provides a standardized way to save trained models from any ML framework into BentoML's versioned store. The key design elements are:
- Immutable Tags: Each saved model receives a tag in the form
name:version. The version is auto-generated if not explicitly provided, ensuring every save produces a unique, immutable reference. This prevents accidental overwrites and supports reproducibility.
- Filesystem Directory per Model: Each model version gets its own directory under
<bentoml_home>/models/<name>/<version>/. This directory contains the serialized artifact files (e.g.,.pkl,.pt,.onnx), amodel.yamldescriptor, and any custom objects.
- Structured Metadata: Beyond the raw model files, each saved model carries structured metadata including labels (key-value pairs for filtering), arbitrary metadata (metrics, hyperparameters), framework context (Python version, framework version), and API signatures.
- Context Manager Pattern: The save operation uses a context manager to ensure atomic saves with proper cleanup on failure. If any error occurs during the save process, the partially written model directory is cleaned up, preventing corrupted entries in the store.
Design Principles
Atomicity
The context manager pattern guarantees that the model store is never left in an inconsistent state. Either the entire model (artifacts + metadata + descriptor) is saved successfully, or the operation is rolled back:
with bentoml.models.create("my_model") as model_ref:
# Write model files into model_ref.path
# If an exception occurs here, cleanup happens automatically
save_model_artifact(model_ref.path)
# Only committed to store if no exception
Framework Agnosticism
The persistence layer does not impose any specific serialization format. The caller decides how to write model files into the provided directory. BentoML records the framework context (module name, framework version) as metadata, but the actual serialization is delegated to the framework-specific code.
Immutability
Once a model version is saved, it cannot be modified in place. A new save always creates a new version. This ensures that any reference to a specific tag always resolves to the same artifact, which is critical for reproducible deployments.
Relationship to Other Principles
- Model Versioning: Persistence creates the versioned artifacts that the versioning system tracks and organizes.
- Model Loading From Store: Persisted models are loaded back via BentoModel descriptors for use in services.
- Model Export/Import: Persisted models can be exported to portable formats for sharing.
- Model Cloud Sync: Persisted models can be pushed to BentoCloud for centralized team access.