Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Bentoml BentoML Client Base

From Leeroopedia
Knowledge Sources
Domains Client, Networking, Abstract Base Classes
Last Updated 2026-02-13 15:00 GMT

Overview

Defines the abstract base classes for BentoML service clients, providing the Client (deprecated), SyncClient, and AsyncClient ABCs that implement auto-discovery of API endpoints and support both HTTP and gRPC transports.

Description

This module establishes the client abstraction layer for communicating with BentoML services. It provides three client base classes:

  1. Client (deprecated): The original client class that wraps both sync and async clients. It is deprecated in favor of SyncClient and AsyncClient and will be removed in BentoML 2.0. Key features:
    • Accepts a Service object and server URL in its constructor.
    • Dynamically creates methods matching each API endpoint name via functools.partial.
    • Provides both call() (sync) and async_call() (async) dispatch methods.
    • Implements context manager protocol (__enter__/__exit__ and __aenter__/__aexit__).
    • Static factory method from_url() with overloads for "http", "grpc", and "auto" transport kinds.
  1. AsyncClient: The abstract async client base class for making asynchronous calls.
    • Constructor discovers all API endpoints from the Service object and creates bound partial methods.
    • call(bentoml_api_name, inp, **kwargs): Public method to invoke a named API.
    • _call(inp, *, _bentoml_api, **kwargs): Abstract method that subclasses (HTTP, gRPC) must implement.
    • wait_until_server_ready(host, port, timeout): Static method that tries HTTP first, falls back to gRPC on BadStatusLine.
    • from_url(server_url, *, kind): Class factory with auto-detection that tries HTTP first, falls back to gRPC.
  1. SyncClient: The abstract synchronous client base class, mirroring AsyncClient with blocking methods.
    • Same endpoint discovery and method creation pattern.
    • call() and abstract _call() for synchronous invocation.
    • wait_until_server_ready() with HTTP-first, gRPC-fallback logic.
    • from_url() with auto-detection.

All three classes raise BentoMLException if no APIs are found during construction or if an invalid transport kind is specified.

Usage

These base classes are not used directly. Concrete subclasses (SyncHTTPClient, AsyncHTTPClient, SyncGrpcClient, AsyncGrpcClient) implement the abstract _call method for their respective transport. Users typically create clients via SyncClient.from_url() or AsyncClient.from_url(), which auto-detect the transport protocol.

Code Reference

Source Location

Signature

class Client(ABC):
    server_url: str
    _svc: Service
    endpoints: list[str]

    def __init__(self, svc: Service, server_url: str): ...
    def call(self, bentoml_api_name: str, inp: t.Any = None, **kwargs: t.Any) -> t.Any: ...
    async def async_call(self, bentoml_api_name: str, inp: t.Any = None, **kwargs: t.Any) -> t.Any: ...
    @staticmethod
    def from_url(server_url: str, *, kind: t.Literal["auto", "http", "grpc"] | None = None, **kwargs: t.Any) -> Client: ...

class AsyncClient(ABC):
    def __init__(self, svc: Service, server_url: str): ...
    async def call(self, bentoml_api_name: str, inp: t.Any = None, **kwargs: t.Any) -> t.Any: ...
    @abstractmethod
    async def _call(self, inp: t.Any = None, *, _bentoml_api: InferenceAPI[t.Any], **kwargs: t.Any) -> t.Any: ...
    @classmethod
    async def from_url(cls, server_url: str, *, kind: t.Literal["auto", "http", "grpc"] | None = None, **kwargs: t.Any) -> AsyncClient: ...

class SyncClient(Client):
    def __init__(self, svc: Service, server_url: str): ...
    def call(self, bentoml_api_name: str, inp: t.Any = None, **kwargs: t.Any) -> t.Any: ...
    @abstractmethod
    def _call(self, inp: t.Any = None, *, _bentoml_api: InferenceAPI[t.Any], **kwargs: t.Any) -> t.Any: ...
    @classmethod
    def from_url(cls, server_url: str, *, kind: t.Literal["auto", "http", "grpc"] | None = None, **kwargs: t.Any) -> SyncClient: ...

Import

from bentoml._internal.client import Client, AsyncClient, SyncClient

I/O Contract

Inputs

Name Type Required Description
svc Service Yes A BentoML Service object containing API endpoint definitions
server_url str Yes URL of the running BentoML service (e.g., "http://localhost:3000")
kind Literal["auto", "http", "grpc"] or None No Transport protocol to use; "auto" tries HTTP first, then gRPC
bentoml_api_name str Yes (for call) Name of the API endpoint to invoke
inp Any No Input data to send to the API endpoint

Outputs

Name Type Description
result Any The deserialized response from the service API endpoint
Client instance Client / AsyncClient / SyncClient A connected client ready to make API calls

Usage Examples

from bentoml._internal.client import SyncClient, AsyncClient

# Synchronous client with auto-detection
client = SyncClient.from_url("http://localhost:3000")
result = client.call("predict", {"data": [1, 2, 3]})
client.close()

# Using context manager
with SyncClient.from_url("http://localhost:3000") as client:
    result = client.predict({"data": [1, 2, 3]})

# Async client
async with await AsyncClient.from_url("http://localhost:3000") as client:
    result = await client.call("predict", {"data": [1, 2, 3]})

# Wait for server readiness
SyncClient.wait_until_server_ready("localhost", 3000, timeout=60)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment