Implementation:Bentoml BentoML SyncHTTPClient For Cloud

Overview

SyncHTTPClient For Cloud implements the Principle:Bentoml_BentoML_Cloud_Endpoint_Invocation principle by providing client methods to invoke deployed BentoML service endpoints, either through the Deployment.get_client() method or by instantiating SyncHTTPClient directly with a cloud URL.

API

Deployment.get_client()
SyncHTTPClient.from_url() (cloud context)

Source

get_client: src/bentoml/_internal/cloud/deployment.py:L598-631
SyncHTTPClient: src/bentoml/_internal/client/http.py:L237-288

Import

from bentoml.client import SyncHTTPClient

Signature

# Via Deployment object
deployment.get_client() -> SyncHTTPClient

# Via direct URL
SyncHTTPClient.from_url(server_url: str) -> SyncHTTPClient

Key Difference from Local

When used in a cloud context, the SyncHTTPClient differs from local usage in these ways:

Aspect	Local	Cloud
URL	`http://localhost:3000`	Deployment endpoint URL
Authentication	Not required	May include auth headers from BentoCloud context
Creation	`SyncHTTPClient("http://localhost:3000")`	`deployment.get_client()` or `SyncHTTPClient.from_url(url)`

Inputs and Outputs

Inputs:

Deployment object - When using get_client(), the deployment's endpoint URL and authentication are handled automatically
Deployment endpoint URL - When using SyncHTTPClient.from_url(), the cloud endpoint URL must be provided directly

Outputs:

SyncHTTPClient instance with dynamically generated API methods matching the deployed service's endpoints

Usage Examples

Via Deployment Object

import bentoml

# Get an existing deployment
deployment = bentoml.deployment.get("my-llm-service")

# Get a client with auto-configured auth
client = deployment.get_client()

# Call the service endpoint
result = client.generate(prompt="Hello, world!", max_tokens=100)
print(result)

Via Direct URL

from bentoml.client import SyncHTTPClient

# Create client directly from cloud URL
client = SyncHTTPClient.from_url("https://my-llm-service-abc123.bentoml.com")

# Call the service endpoint
result = client.generate(prompt="Hello, world!", max_tokens=100)
print(result)

Within Create Flow

import bentoml

# Create deployment and immediately get a client
deployment = bentoml.deployment.create(
    bento="my_service:latest",
    scaling_min=1,
)

# Wait for deployment to be ready, then get client
client = deployment.get_client()
response = client.predict(input_data=[1.0, 2.0, 3.0])

Dynamic Method Generation

The client introspects the service's OpenAPI specification to generate methods. For a service defined as:

@bentoml.service
class MyService:
    @bentoml.api
    def predict(self, input_data: list[float]) -> float:
        ...

    @bentoml.api
    def classify(self, text: str) -> dict:
        ...

The client will have corresponding client.predict() and client.classify() methods with matching signatures.

Metadata

Property	Value
Implementation	SyncHTTPClient For Cloud
API	`Deployment.get_client()` / `SyncHTTPClient.from_url()`
Source	`src/bentoml/_internal/cloud/deployment.py:L598-631`, `src/bentoml/_internal/client/http.py:L237-288`
Domain	ML_Serving, Cloud_Deployment, API_Design
Workflow	BentoCloud_Deployment
Principle	Principle:Bentoml_BentoML_Cloud_Endpoint_Invocation

Knowledge Sources

2026-02-13 15:00 GMT

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment