Implementation:Bentoml BentoML Models Push Pull
| Implementation Metadata | |
|---|---|
| Implementation Name | Models Push Pull |
| API | bentoml.models.push(), bentoml.models.pull()
|
| Source | src/bentoml/models.py:L211-233 (public API), src/bentoml/_internal/cloud/model.py:L56-507 (implementation)
|
| Workflow | Model_Store_Management |
| Domain | ML_Serving, Model_Management, Cloud_Deployment |
| Implements | Principle:Bentoml_BentoML_Model_Cloud_Sync |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
The bentoml.models.push() and bentoml.models.pull() functions provide bi-directional synchronization between the local BentoML model store and BentoCloud's centralized registry. They use multipart upload/download with parallel threads for efficient transfer of large model artifacts.
Import
import bentoml
Signatures
def push(tag: Tag | str, *, force: bool = False) -> None
def pull(tag: Tag | str, *, force: bool = False) -> Model | None
Parameters
bentoml.models.push()
| Parameter | Type | Default | Description |
|---|---|---|---|
tag |
str | required | The model tag to push from the local store to BentoCloud. |
force |
bool |
False |
If True, overwrite the model in BentoCloud even if a model with the same tag already exists.
|
bentoml.models.pull()
| Parameter | Type | Default | Description |
|---|---|---|---|
tag |
str | required | The model tag to pull from BentoCloud to the local store. |
force |
bool |
False |
If True, overwrite the local model even if a model with the same tag already exists locally.
|
Inputs and Outputs
push()
Inputs:
- Model tag (must exist in local store); requires authenticated BentoCloud session
Outputs:
None— the model is uploaded to BentoCloud as a side effect
pull()
Inputs:
- Model tag (must exist in BentoCloud); requires authenticated BentoCloud session
Outputs:
Model | None— the pulled model instance, orNoneif the pull failed
Internal Implementation Details
The public API functions in src/bentoml/models.py delegate to the cloud implementation in src/bentoml/_internal/cloud/model.py:
- Multipart Upload/Download: Model artifacts are split into chunks and transferred using parallel threads (
threads=10by default) for maximum throughput. - Progress Tracking: Transfer progress is reported during push/pull operations.
- Manifest Management: The cloud implementation manages model manifests that track the list of files and their checksums for integrity verification.
Usage Examples
import bentoml
# Push a model to BentoCloud
bentoml.models.push("text_classifier:latest")
# Push with force overwrite
bentoml.models.push("text_classifier:v2", force=True)
# Pull a model from BentoCloud
model = bentoml.models.pull("text_classifier:production")
if model:
print(f"Pulled: {model.tag} to {model.path}")
# Pull with force overwrite of local copy
model = bentoml.models.pull("text_classifier:production", force=True)
Authentication
Push and pull operations require an authenticated BentoCloud session. Authentication is typically configured via:
bentoml cloud login --api-token <token> --endpoint <endpoint>
Or by setting environment variables:
export BENTOCLOUD_API_TOKEN=<token>
export BENTOCLOUD_ENDPOINT=<endpoint>
Behavior Details
- Push: Reads the model from the local store, uploads artifact files via multipart HTTP upload to BentoCloud, and registers the model metadata. If the model already exists in BentoCloud and
force=False, the operation is skipped. - Pull: Downloads model artifact files via multipart HTTP download from BentoCloud and reconstructs the model in the local store. If the model already exists locally and
force=False, the operation is skipped. - Parallel Transfer: Both push and pull use 10 concurrent threads by default for transferring file chunks, significantly improving throughput for large models.
- Integrity: File checksums are verified after transfer to ensure data integrity.
Source Reference
- Public API:
src/bentoml/models.py, lines 211-233 - Cloud implementation:
src/bentoml/_internal/cloud/model.py, lines 56-507