Implementation:Bentoml BentoML Client Base

Knowledge Sources	Bentoml_BentoML
Domains	Client, Networking, Abstract Base Classes
Last Updated	2026-02-13 15:00 GMT

Overview

Defines the abstract base classes for BentoML service clients, providing the Client (deprecated), SyncClient, and AsyncClient ABCs that implement auto-discovery of API endpoints and support both HTTP and gRPC transports.

Description

This module establishes the client abstraction layer for communicating with BentoML services. It provides three client base classes:

Client (deprecated): The original client class that wraps both sync and async clients. It is deprecated in favor of SyncClient and AsyncClient and will be removed in BentoML 2.0. Key features:
- Accepts a Service object and server URL in its constructor.
- Dynamically creates methods matching each API endpoint name via functools.partial.
- Provides both call() (sync) and async_call() (async) dispatch methods.
- Implements context manager protocol (__enter__/__exit__ and __aenter__/__aexit__).
- Static factory method from_url() with overloads for "http", "grpc", and "auto" transport kinds.

AsyncClient: The abstract async client base class for making asynchronous calls.
- Constructor discovers all API endpoints from the Service object and creates bound partial methods.
- call(bentoml_api_name, inp, **kwargs): Public method to invoke a named API.
- _call(inp, *, _bentoml_api, **kwargs): Abstract method that subclasses (HTTP, gRPC) must implement.
- wait_until_server_ready(host, port, timeout): Static method that tries HTTP first, falls back to gRPC on BadStatusLine.
- from_url(server_url, *, kind): Class factory with auto-detection that tries HTTP first, falls back to gRPC.

SyncClient: The abstract synchronous client base class, mirroring AsyncClient with blocking methods.
- Same endpoint discovery and method creation pattern.
- call() and abstract _call() for synchronous invocation.
- wait_until_server_ready() with HTTP-first, gRPC-fallback logic.
- from_url() with auto-detection.

All three classes raise BentoMLException if no APIs are found during construction or if an invalid transport kind is specified.

Usage

These base classes are not used directly. Concrete subclasses (SyncHTTPClient, AsyncHTTPClient, SyncGrpcClient, AsyncGrpcClient) implement the abstract _call method for their respective transport. Users typically create clients via SyncClient.from_url() or AsyncClient.from_url(), which auto-detect the transport protocol.

Code Reference

Source Location

Repository: Bentoml_BentoML
File: src/bentoml/_internal/client/__init__.py
Lines: 1-371

Signature

class Client(ABC):
    server_url: str
    _svc: Service
    endpoints: list[str]

    def __init__(self, svc: Service, server_url: str): ...
    def call(self, bentoml_api_name: str, inp: t.Any = None, **kwargs: t.Any) -> t.Any: ...
    async def async_call(self, bentoml_api_name: str, inp: t.Any = None, **kwargs: t.Any) -> t.Any: ...
    @staticmethod
    def from_url(server_url: str, *, kind: t.Literal["auto", "http", "grpc"] | None = None, **kwargs: t.Any) -> Client: ...

class AsyncClient(ABC):
    def __init__(self, svc: Service, server_url: str): ...
    async def call(self, bentoml_api_name: str, inp: t.Any = None, **kwargs: t.Any) -> t.Any: ...
    @abstractmethod
    async def _call(self, inp: t.Any = None, *, _bentoml_api: InferenceAPI[t.Any], **kwargs: t.Any) -> t.Any: ...
    @classmethod
    async def from_url(cls, server_url: str, *, kind: t.Literal["auto", "http", "grpc"] | None = None, **kwargs: t.Any) -> AsyncClient: ...

class SyncClient(Client):
    def __init__(self, svc: Service, server_url: str): ...
    def call(self, bentoml_api_name: str, inp: t.Any = None, **kwargs: t.Any) -> t.Any: ...
    @abstractmethod
    def _call(self, inp: t.Any = None, *, _bentoml_api: InferenceAPI[t.Any], **kwargs: t.Any) -> t.Any: ...
    @classmethod
    def from_url(cls, server_url: str, *, kind: t.Literal["auto", "http", "grpc"] | None = None, **kwargs: t.Any) -> SyncClient: ...

Import

from bentoml._internal.client import Client, AsyncClient, SyncClient

I/O Contract

Inputs

Name	Type	Required	Description
svc	Service	Yes	A BentoML Service object containing API endpoint definitions
server_url	str	Yes	URL of the running BentoML service (e.g., "http://localhost:3000")
kind	Literal["auto", "http", "grpc"] or None	No	Transport protocol to use; "auto" tries HTTP first, then gRPC
bentoml_api_name	str	Yes (for call)	Name of the API endpoint to invoke
inp	Any	No	Input data to send to the API endpoint

Outputs

Name	Type	Description
result	Any	The deserialized response from the service API endpoint
Client instance	Client / AsyncClient / SyncClient	A connected client ready to make API calls

Usage Examples

from bentoml._internal.client import SyncClient, AsyncClient

# Synchronous client with auto-detection
client = SyncClient.from_url("http://localhost:3000")
result = client.call("predict", {"data": [1, 2, 3]})
client.close()

# Using context manager
with SyncClient.from_url("http://localhost:3000") as client:
    result = client.predict({"data": [1, 2, 3]})

# Async client
async with await AsyncClient.from_url("http://localhost:3000") as client:
    result = await client.call("predict", {"data": [1, 2, 3]})

# Wait for server readiness
SyncClient.wait_until_server_ready("localhost", 3000, timeout=60)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment