Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Bentoml BentoML Model Loading From Store

From Leeroopedia
Principle Metadata
Principle Name Model Loading From Store
Workflow Model_Store_Management
Domain ML_Serving, Model_Management
Related Principle Principle:Bentoml_BentoML_Model_Persistence
Implemented By Implementation:Bentoml_BentoML_BentoModel_Descriptor
Last Updated 2026-02-13 15:00 GMT

Overview

Model Loading From Store is the principle of resolving and loading saved model artifacts from the BentoML local model store into running services. It uses a descriptor-based approach that enables declarative model dependencies with lazy resolution, automatic cloud pull, and seamless integration with the service lifecycle.

Core Concept

Loading models from the BentoML store into services requires a mechanism that is both declarative (models are specified at class definition time) and lazy (resolution happens at runtime when the model is actually needed). The BentoModel descriptor pattern achieves this by acting as a proxy that resolves the actual model artifact on first access.

Theory

BentoModel provides a descriptor-based approach to referencing models by tag. When accessed on a service instance, it lazily resolves the model from the local store or automatically pulls it from BentoCloud. This provides a declarative way to specify model dependencies.

The key aspects of this approach are:

  • Descriptor Pattern: BentoModel uses Python's descriptor protocol. When declared as a class attribute on a service, it intercepts attribute access to trigger model resolution. This means the model tag is declared once, and actual loading is deferred until the service needs it.
  • Lazy Resolution: The model is not loaded when the service class is defined or even when it is instantiated. Resolution occurs when the model attribute is first accessed, allowing the system to defer expensive I/O until it is truly needed.
  • Automatic Cloud Pull: If the model is not found in the local store, BentoModel can automatically pull it from BentoCloud. This eliminates the need for manual bentoml.models.pull() calls in deployment scripts and simplifies CI/CD pipelines.
  • Store-Centric Resolution: Unlike loading models directly from external sources (e.g., HuggingFace Hub), this principle centers on the BentoML model store as the canonical source. Models must first be saved to the store (via bentoml.models.create()) before they can be loaded via BentoModel.

Distinction From External Model Loading

This principle is distinct from Model Loading for Serving (e.g., loading from HuggingFace). The key differences are:

Aspect Model Loading From Store External Model Loading
Source BentoML local store External provider (HuggingFace, etc.)
Mechanism BentoModel descriptor with tag resolution Framework-specific loaders
Versioning BentoML tag-based (name:version) Provider-specific versioning
Offline Support Fully offline from local store Requires network for first download
Cloud Fallback Auto-pulls from BentoCloud Provider-dependent caching

Design Principles

Declarative Dependencies

Models are declared as class-level attributes, making it immediately clear which models a service depends on:

@bentoml.service
class MyService:
    model = bentoml.models.BentoModel("my_classifier:latest")

Transparent Resolution

The resolution process (local lookup, optional cloud pull) is transparent to the service code. The service simply accesses self.model and receives a resolved model with a .path to the artifact files.

Immutable References

Once resolved, the model reference is fixed for the lifetime of the service instance, ensuring consistent behavior across requests.

Relationship to Other Principles

  • Model Persistence: Models must be persisted before they can be loaded from the store.
  • Model Cloud Sync: BentoModel leverages push/pull to resolve models not found locally.
  • Model Versioning: The tag-based resolution uses the versioning system to find the correct model.

Knowledge Sources

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment