Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Bentoml BentoML Cloud Endpoint Invocation

From Leeroopedia

Overview

Cloud Endpoint Invocation is the principle of calling deployed BentoML service endpoints from client applications using auto-generated, type-safe client interfaces.

Concept

Invoking deployed BentoML service endpoints from client applications should be as seamless as calling local functions. The client library introspects the deployed service's API specification and dynamically generates methods that mirror the service's interface, making remote calls feel like local method invocations.

Theory

Deployed services expose HTTP endpoints that can be called using auto-generated clients. The client introspects the service's OpenAPI spec to provide type-safe method calls, handling serialization/deserialization transparently. This is the same SyncHTTPClient used for local testing but targeted at cloud endpoints. Key advantages include:

  • Unified client interface - The same SyncHTTPClient class works for both local development servers and cloud-deployed endpoints, reducing the learning curve
  • Auto-generated methods - Client methods are dynamically created based on the service's OpenAPI specification, ensuring they always match the deployed API
  • Transparent serialization - Complex data types (NumPy arrays, Pandas DataFrames, images) are automatically serialized and deserialized
  • Type safety - Method signatures and parameter types are derived from the service definition, enabling IDE autocompletion and type checking
  • Authentication handling - When targeting cloud endpoints, the client automatically includes authentication headers from the active BentoCloud context

Invocation Flow

  1. Client creation - The client is instantiated with the deployment URL or retrieved from a Deployment object
  2. API discovery - The client fetches the service's OpenAPI specification from the /docs.json endpoint
  3. Method generation - Python methods are dynamically generated for each API endpoint
  4. Call execution - When a method is called, arguments are serialized, sent as an HTTP request, and the response is deserialized
  5. Result return - The deserialized response is returned as native Python objects

Local vs Cloud Invocation

Aspect Local Cloud
URL http://localhost:3000 https://my-deployment.bentoml.com
Authentication Not required API token or deployment token included
Client class SyncHTTPClient SyncHTTPClient (same class)
API discovery Same mechanism Same mechanism
Serialization Same behavior Same behavior

The consistency between local and cloud invocation enables a smooth development workflow where services are tested locally and deployed to the cloud without changing client code.

Metadata

Property Value
Principle Cloud Endpoint Invocation
Domain ML_Serving, Cloud_Deployment, API_Design
Workflow BentoCloud_Deployment
Related Concepts OpenAPI, HTTP Clients, RPC, Serialization
Implementation Implementation:Bentoml_BentoML_SyncHTTPClient_For_Cloud

Knowledge Sources

2026-02-13 15:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment