Principle:Bentoml BentoML Cloud Endpoint Invocation

Overview

Cloud Endpoint Invocation is the principle of calling deployed BentoML service endpoints from client applications using auto-generated, type-safe client interfaces.

Concept

Invoking deployed BentoML service endpoints from client applications should be as seamless as calling local functions. The client library introspects the deployed service's API specification and dynamically generates methods that mirror the service's interface, making remote calls feel like local method invocations.

Theory

Deployed services expose HTTP endpoints that can be called using auto-generated clients. The client introspects the service's OpenAPI spec to provide type-safe method calls, handling serialization/deserialization transparently. This is the same SyncHTTPClient used for local testing but targeted at cloud endpoints. Key advantages include:

Unified client interface - The same SyncHTTPClient class works for both local development servers and cloud-deployed endpoints, reducing the learning curve
Auto-generated methods - Client methods are dynamically created based on the service's OpenAPI specification, ensuring they always match the deployed API
Transparent serialization - Complex data types (NumPy arrays, Pandas DataFrames, images) are automatically serialized and deserialized
Type safety - Method signatures and parameter types are derived from the service definition, enabling IDE autocompletion and type checking
Authentication handling - When targeting cloud endpoints, the client automatically includes authentication headers from the active BentoCloud context

Invocation Flow

Client creation - The client is instantiated with the deployment URL or retrieved from a Deployment object
API discovery - The client fetches the service's OpenAPI specification from the /docs.json endpoint
Method generation - Python methods are dynamically generated for each API endpoint
Call execution - When a method is called, arguments are serialized, sent as an HTTP request, and the response is deserialized
Result return - The deserialized response is returned as native Python objects

Local vs Cloud Invocation

Aspect	Local	Cloud
URL	`http://localhost:3000`	`https://my-deployment.bentoml.com`
Authentication	Not required	API token or deployment token included
Client class	`SyncHTTPClient`	`SyncHTTPClient` (same class)
API discovery	Same mechanism	Same mechanism
Serialization	Same behavior	Same behavior

The consistency between local and cloud invocation enables a smooth development workflow where services are tested locally and deployed to the cloud without changing client code.

Metadata

Property	Value
Principle	Cloud Endpoint Invocation
Domain	ML_Serving, Cloud_Deployment, API_Design
Workflow	BentoCloud_Deployment
Related Concepts	OpenAPI, HTTP Clients, RPC, Serialization
Implementation	Implementation:Bentoml_BentoML_SyncHTTPClient_For_Cloud

Knowledge Sources

2026-02-13 15:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment