Principle:Bentoml BentoML Model Persistence

Principle Metadata
Principle Name	Model Persistence
Workflow	Model_Store_Management
Domain	ML_Serving, Model_Management
Related Principle	Principle:Bentoml_BentoML_Model_Versioning
Implemented By	Implementation:Bentoml_BentoML_Models_Create
Last Updated	2026-02-13 15:00 GMT

Overview

Model Persistence is the principle of saving trained ML model artifacts into BentoML's versioned local store. It provides a standardized, framework-agnostic mechanism for capturing model files, metadata, and configuration so that models can be reliably retrieved, served, and shared.

Core Concept

Saving ML model artifacts to a versioned local store ensures that every trained model is captured in a reproducible and immutable form. Regardless of the ML framework used (PyTorch, TensorFlow, scikit-learn, XGBoost, etc.), the persistence layer normalizes the storage format so that downstream operations (serving, export, deployment) work uniformly.

Theory

Model persistence provides a standardized way to save trained models from any ML framework into BentoML's versioned store. The key design elements are:

Immutable Tags: Each saved model receives a tag in the form name:version. The version is auto-generated if not explicitly provided, ensuring every save produces a unique, immutable reference. This prevents accidental overwrites and supports reproducibility.

Filesystem Directory per Model: Each model version gets its own directory under <bentoml_home>/models/<name>/<version>/. This directory contains the serialized artifact files (e.g., .pkl, .pt, .onnx), a model.yaml descriptor, and any custom objects.

Structured Metadata: Beyond the raw model files, each saved model carries structured metadata including labels (key-value pairs for filtering), arbitrary metadata (metrics, hyperparameters), framework context (Python version, framework version), and API signatures.

Context Manager Pattern: The save operation uses a context manager to ensure atomic saves with proper cleanup on failure. If any error occurs during the save process, the partially written model directory is cleaned up, preventing corrupted entries in the store.

Design Principles

Atomicity

The context manager pattern guarantees that the model store is never left in an inconsistent state. Either the entire model (artifacts + metadata + descriptor) is saved successfully, or the operation is rolled back:

with bentoml.models.create("my_model") as model_ref:
    # Write model files into model_ref.path
    # If an exception occurs here, cleanup happens automatically
    save_model_artifact(model_ref.path)
# Only committed to store if no exception

Framework Agnosticism

The persistence layer does not impose any specific serialization format. The caller decides how to write model files into the provided directory. BentoML records the framework context (module name, framework version) as metadata, but the actual serialization is delegated to the framework-specific code.

Immutability

Once a model version is saved, it cannot be modified in place. A new save always creates a new version. This ensures that any reference to a specific tag always resolves to the same artifact, which is critical for reproducible deployments.

Relationship to Other Principles

Model Versioning: Persistence creates the versioned artifacts that the versioning system tracks and organizes.
Model Loading From Store: Persisted models are loaded back via BentoModel descriptors for use in services.
Model Export/Import: Persisted models can be exported to portable formats for sharing.
Model Cloud Sync: Persisted models can be pushed to BentoCloud for centralized team access.

Knowledge Sources

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment