Implementation:Bentoml BentoML Models Push Pull

Implementation Metadata
Implementation Name	Models Push Pull
API	`bentoml.models.push()`, `bentoml.models.pull()`
Source	`src/bentoml/models.py:L211-233` (public API), `src/bentoml/_internal/cloud/model.py:L56-507` (implementation)
Workflow	Model_Store_Management
Domain	ML_Serving, Model_Management, Cloud_Deployment
Implements	Principle:Bentoml_BentoML_Model_Cloud_Sync
Last Updated	2026-02-13 15:00 GMT

Overview

The bentoml.models.push() and bentoml.models.pull() functions provide bi-directional synchronization between the local BentoML model store and BentoCloud's centralized registry. They use multipart upload/download with parallel threads for efficient transfer of large model artifacts.

Import

import bentoml

Signatures

def push(tag: Tag | str, *, force: bool = False) -> None

def pull(tag: Tag | str, *, force: bool = False) -> Model | None

Parameters

bentoml.models.push()

Parameter	Type	Default	Description
`tag`	str	required	The model tag to push from the local store to BentoCloud.
`force`	`bool`	`False`	If `True`, overwrite the model in BentoCloud even if a model with the same tag already exists.

bentoml.models.pull()

Parameter	Type	Default	Description
`tag`	str	required	The model tag to pull from BentoCloud to the local store.
`force`	`bool`	`False`	If `True`, overwrite the local model even if a model with the same tag already exists locally.

Inputs and Outputs

push()

Inputs:

Model tag (must exist in local store); requires authenticated BentoCloud session

Outputs:

None — the model is uploaded to BentoCloud as a side effect

pull()

Inputs:

Model tag (must exist in BentoCloud); requires authenticated BentoCloud session

Outputs:

Model | None — the pulled model instance, or None if the pull failed

Internal Implementation Details

The public API functions in src/bentoml/models.py delegate to the cloud implementation in src/bentoml/_internal/cloud/model.py:

Multipart Upload/Download: Model artifacts are split into chunks and transferred using parallel threads (threads=10 by default) for maximum throughput.
Progress Tracking: Transfer progress is reported during push/pull operations.
Manifest Management: The cloud implementation manages model manifests that track the list of files and their checksums for integrity verification.

Usage Examples

import bentoml

# Push a model to BentoCloud
bentoml.models.push("text_classifier:latest")

# Push with force overwrite
bentoml.models.push("text_classifier:v2", force=True)

# Pull a model from BentoCloud
model = bentoml.models.pull("text_classifier:production")
if model:
    print(f"Pulled: {model.tag} to {model.path}")

# Pull with force overwrite of local copy
model = bentoml.models.pull("text_classifier:production", force=True)

Authentication

Push and pull operations require an authenticated BentoCloud session. Authentication is typically configured via:

bentoml cloud login --api-token <token> --endpoint <endpoint>

Or by setting environment variables:

export BENTOCLOUD_API_TOKEN=<token>
export BENTOCLOUD_ENDPOINT=<endpoint>

Behavior Details

Push: Reads the model from the local store, uploads artifact files via multipart HTTP upload to BentoCloud, and registers the model metadata. If the model already exists in BentoCloud and force=False, the operation is skipped.
Pull: Downloads model artifact files via multipart HTTP download from BentoCloud and reconstructs the model in the local store. If the model already exists locally and force=False, the operation is skipped.
Parallel Transfer: Both push and pull use 10 concurrent threads by default for transferring file chunks, significantly improving throughput for large models.
Integrity: File checksums are verified after transfer to ensure data integrity.

Source Reference

Public API: src/bentoml/models.py, lines 211-233
Cloud implementation: src/bentoml/_internal/cloud/model.py, lines 56-507

Related Pages

Knowledge Sources

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment