Principle:Bentoml BentoML Cloud Endpoint Invocation
Overview
Cloud Endpoint Invocation is the principle of calling deployed BentoML service endpoints from client applications using auto-generated, type-safe client interfaces.
Concept
Invoking deployed BentoML service endpoints from client applications should be as seamless as calling local functions. The client library introspects the deployed service's API specification and dynamically generates methods that mirror the service's interface, making remote calls feel like local method invocations.
Theory
Deployed services expose HTTP endpoints that can be called using auto-generated clients. The client introspects the service's OpenAPI spec to provide type-safe method calls, handling serialization/deserialization transparently. This is the same SyncHTTPClient used for local testing but targeted at cloud endpoints. Key advantages include:
- Unified client interface - The same
SyncHTTPClientclass works for both local development servers and cloud-deployed endpoints, reducing the learning curve - Auto-generated methods - Client methods are dynamically created based on the service's OpenAPI specification, ensuring they always match the deployed API
- Transparent serialization - Complex data types (NumPy arrays, Pandas DataFrames, images) are automatically serialized and deserialized
- Type safety - Method signatures and parameter types are derived from the service definition, enabling IDE autocompletion and type checking
- Authentication handling - When targeting cloud endpoints, the client automatically includes authentication headers from the active BentoCloud context
Invocation Flow
- Client creation - The client is instantiated with the deployment URL or retrieved from a Deployment object
- API discovery - The client fetches the service's OpenAPI specification from the
/docs.jsonendpoint - Method generation - Python methods are dynamically generated for each API endpoint
- Call execution - When a method is called, arguments are serialized, sent as an HTTP request, and the response is deserialized
- Result return - The deserialized response is returned as native Python objects
Local vs Cloud Invocation
| Aspect | Local | Cloud |
|---|---|---|
| URL | http://localhost:3000 |
https://my-deployment.bentoml.com
|
| Authentication | Not required | API token or deployment token included |
| Client class | SyncHTTPClient |
SyncHTTPClient (same class)
|
| API discovery | Same mechanism | Same mechanism |
| Serialization | Same behavior | Same behavior |
The consistency between local and cloud invocation enables a smooth development workflow where services are tested locally and deployed to the cloud without changing client code.
Metadata
| Property | Value |
|---|---|
| Principle | Cloud Endpoint Invocation |
| Domain | ML_Serving, Cloud_Deployment, API_Design |
| Workflow | BentoCloud_Deployment |
| Related Concepts | OpenAPI, HTTP Clients, RPC, Serialization |
| Implementation | Implementation:Bentoml_BentoML_SyncHTTPClient_For_Cloud |