Implementation:Bentoml BentoML Deployment Create

Overview

Deployment Create implements the Principle:Bentoml_BentoML_Cloud_Deployment_Creation principle by providing the bentoml.deployment.create() function that deploys a Bento artifact to BentoCloud as a running inference endpoint.

API

bentoml.deployment.create()

Source

src/bentoml/deployment.py:L68-130

Import

import bentoml

Signature

def create(
    name: str = None,
    path_context: str = None,
    *,
    bento: Tag | str = None,
    cluster: str = None,
    access_authorization: bool = None,
    scaling_min: int = None,
    scaling_max: int = None,
    instance_type: str = None,
    strategy: str = None,
    envs: list = None,
    labels: list = None,
    secrets: list[str] = None,
    extras: dict = None,
    config_dict: dict = None,
    config_file: str = None,
    args: dict = None,
) -> Deployment

Key Parameters

Parameter	Type	Default	Description
`name`	str	None	Unique deployment name (auto-generated if not provided)
`bento`	str	None	Bento tag to deploy (e.g., `"my_service:latest"`)
`cluster`	str	None	Target cluster for deployment
`scaling_min`	int	None	Minimum replicas (0 for scale-to-zero)
`scaling_max`	int	None	Maximum replicas for auto-scaling
`instance_type`	str	None	Compute instance type (e.g., `"gpu.a10.1"`)
`config_file`	str	None	Path to YAML config file
`envs`	list	None	Environment variables
`secrets`	list[str]	None	Named secrets from BentoCloud
`args`	dict	None	Additional arguments passed to the deployment

Inputs and Outputs

Inputs:

Built Bento tag (local or already pushed to BentoCloud)
Deployment configuration parameters (inline or via config file)

Outputs:

Deployment object with the following key attributes:
- name - Unique deployment identifier
- admin_console - URL to the BentoCloud dashboard
- cluster - Target cluster name
- status - Current deployment status

Usage Examples

Minimal Deployment

import bentoml

# Deploy with minimal configuration
deployment = bentoml.deployment.create(
    bento="my_service:latest",
)
print(f"Deployed: {deployment.name}")
print(f"Console: {deployment.admin_console}")

Full Configuration

import bentoml

deployment = bentoml.deployment.create(
    name="my-llm-service",
    bento="llm_service:v2",
    cluster="gcp-us-central1",
    access_authorization=True,
    scaling_min=1,
    scaling_max=5,
    instance_type="gpu.a10.1",
    strategy="RollingUpdate",
    envs=[{"name": "MODEL_ID", "value": "meta-llama/Llama-3-8B"}],
    secrets=["hf-token"],
    labels=[{"key": "team", "value": "ml-platform"}],
)

Using Config File

import bentoml

deployment = bentoml.deployment.create(
    config_file="deployment.yaml",
)

CLI Usage

# Deploy from CLI
bentoml deploy my_service:latest --name my-deployment --scaling-min 1 --scaling-max 5

# Deploy using config file
bentoml deploy --config deployment.yaml

Metadata

Property	Value
Implementation	Deployment Create
API	`bentoml.deployment.create()`
Source	`src/bentoml/deployment.py:L68-130`
Domain	ML_Serving, Cloud_Deployment
Workflow	BentoCloud_Deployment
Principle	Principle:Bentoml_BentoML_Cloud_Deployment_Creation

Knowledge Sources

2026-02-13 15:00 GMT

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment