Implementation:Bentoml BentoML Deployment Create
Appearance
Overview
Deployment Create implements the Principle:Bentoml_BentoML_Cloud_Deployment_Creation principle by providing the bentoml.deployment.create() function that deploys a Bento artifact to BentoCloud as a running inference endpoint.
API
bentoml.deployment.create()
Source
src/bentoml/deployment.py:L68-130
Import
import bentoml
Signature
def create(
name: str = None,
path_context: str = None,
*,
bento: Tag | str = None,
cluster: str = None,
access_authorization: bool = None,
scaling_min: int = None,
scaling_max: int = None,
instance_type: str = None,
strategy: str = None,
envs: list = None,
labels: list = None,
secrets: list[str] = None,
extras: dict = None,
config_dict: dict = None,
config_file: str = None,
args: dict = None,
) -> Deployment
Key Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
str | None | Unique deployment name (auto-generated if not provided) |
bento |
str | None | Bento tag to deploy (e.g., "my_service:latest")
|
cluster |
str | None | Target cluster for deployment |
scaling_min |
int | None | Minimum replicas (0 for scale-to-zero) |
scaling_max |
int | None | Maximum replicas for auto-scaling |
instance_type |
str | None | Compute instance type (e.g., "gpu.a10.1")
|
config_file |
str | None | Path to YAML config file |
envs |
list | None | Environment variables |
secrets |
list[str] | None | Named secrets from BentoCloud |
args |
dict | None | Additional arguments passed to the deployment |
Inputs and Outputs
Inputs:
- Built Bento tag (local or already pushed to BentoCloud)
- Deployment configuration parameters (inline or via config file)
Outputs:
- Deployment object with the following key attributes:
name- Unique deployment identifieradmin_console- URL to the BentoCloud dashboardcluster- Target cluster namestatus- Current deployment status
Usage Examples
Minimal Deployment
import bentoml
# Deploy with minimal configuration
deployment = bentoml.deployment.create(
bento="my_service:latest",
)
print(f"Deployed: {deployment.name}")
print(f"Console: {deployment.admin_console}")
Full Configuration
import bentoml
deployment = bentoml.deployment.create(
name="my-llm-service",
bento="llm_service:v2",
cluster="gcp-us-central1",
access_authorization=True,
scaling_min=1,
scaling_max=5,
instance_type="gpu.a10.1",
strategy="RollingUpdate",
envs=[{"name": "MODEL_ID", "value": "meta-llama/Llama-3-8B"}],
secrets=["hf-token"],
labels=[{"key": "team", "value": "ml-platform"}],
)
Using Config File
import bentoml
deployment = bentoml.deployment.create(
config_file="deployment.yaml",
)
CLI Usage
# Deploy from CLI
bentoml deploy my_service:latest --name my-deployment --scaling-min 1 --scaling-max 5
# Deploy using config file
bentoml deploy --config deployment.yaml
Metadata
| Property | Value |
|---|---|
| Implementation | Deployment Create |
| API | bentoml.deployment.create()
|
| Source | src/bentoml/deployment.py:L68-130
|
| Domain | ML_Serving, Cloud_Deployment |
| Workflow | BentoCloud_Deployment |
| Principle | Principle:Bentoml_BentoML_Cloud_Deployment_Creation |
Knowledge Sources
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment