Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Bentoml BentoML Server Classes

From Leeroopedia
Knowledge Sources
Domains Serving, Server Management
Last Updated 2026-02-13 15:00 GMT

Overview

Provides deprecated programmatic server classes (Server, HTTPServer, GrpcServer) for starting and managing BentoML serving processes as subprocesses with client connectivity.

Description

The server.py module defines an abstract base class Server and two concrete implementations, HTTPServer and GrpcServer, that allow users to programmatically launch BentoML serving processes. Each server wraps a subprocess invocation of the BentoML CLI serve command, manages the process lifecycle (start, wait-until-ready, stop), and provides typed client access (HTTPClient or GrpcClient) for communicating with the running server.

The module is marked as deprecated at import time; users are directed to use bentoml.serve() instead. The Server base class is generic over a ClientType bound to Client, supporting both HTTP and gRPC protocols. The start() method returns a context manager that yields the appropriate client, and the stop() method handles graceful and forceful subprocess termination. The HTTPServer additionally supports SSL configuration and timeout parameters, while GrpcServer supports gRPC-specific options such as reflection, channelz, max concurrent streams, and protocol version.

Usage

Use these classes when you need to programmatically start a BentoML server from Python code (for example, in integration tests or notebooks) and interact with it via a client. Prefer bentoml.serve() for new code as this module is deprecated.

Code Reference

Source Location

Signature

class Server(ABC, t.Generic[ClientType]):
    def __init__(
        self,
        servable: str | Bento | Tag | Service | NewService[t.Any],
        serve_cmd: str,
        reload: bool,
        production: bool,
        env: t.Literal["conda"] | None,
        host: str,
        port: int,
        working_dir: str | None,
        api_workers: int | None,
        backlog: int,
        bento: str | Bento | Tag | Service | None = None,
        timeout: float = 10,
    ): ...

    def start(
        self,
        blocking: bool = False,
        env: dict[str, str] | None = None,
        stdin: _FILE = None,
        stdout: _FILE = None,
        stderr: _FILE = None,
        text: bool | None = None,
    ) -> t.ContextManager[ClientType]: ...

    def get_client(self) -> ClientType: ...
    def stop(self) -> None: ...

class HTTPServer(Server[HTTPClient]):
    def __init__(
        self,
        bento: str | Bento | Tag | Service,
        reload: bool = False,
        production: bool = True,
        env: t.Literal["conda"] | None = None,
        host: str = ...,
        port: int = ...,
        timeout: float = 10,
        working_dir: str | None = None,
        api_workers: int | None = ...,
        backlog: int = ...,
        ssl_certfile: str | None = ...,
        ssl_keyfile: str | None = ...,
        ssl_keyfile_password: str | None = ...,
        ssl_version: int | None = ...,
        ssl_cert_reqs: int | None = ...,
        ssl_ca_certs: str | None = ...,
        ssl_ciphers: str | None = ...,
        timeout_keep_alive: int | None = None,
        timeout_graceful_shutdown: int | None = None,
    ): ...

class GrpcServer(Server[GrpcClient]):
    def __init__(
        self,
        bento: str | Bento | Tag | Service,
        reload: bool = False,
        production: bool = True,
        env: t.Literal["conda"] | None = None,
        host: str = ...,
        port: int = ...,
        timeout: float = 10,
        working_dir: str | None = None,
        api_workers: int | None = ...,
        backlog: int = ...,
        enable_reflection: bool = ...,
        enable_channelz: bool = ...,
        max_concurrent_streams: int | None = ...,
        ssl_certfile: str | None = ...,
        ssl_keyfile: str | None = ...,
        ssl_ca_certs: str | None = ...,
        grpc_protocol_version: str | None = None,
    ): ...

Import

from bentoml.server import HTTPServer, GrpcServer, Server

I/O Contract

Inputs

Name Type Required Description
servable Bento | Tag | Service | NewService Yes The BentoML service or bento tag to serve
host str No The host address to bind (default from BentoMLContainer config)
port int No The port number to bind (default from BentoMLContainer config)
reload bool No Enable auto-reload on code changes (default False)
production bool No Run in production mode (default True)
working_dir None No Working directory for the service
api_workers None No Number of API worker processes
backlog int No Socket backlog size
timeout float No Timeout in seconds to wait for server readiness (default 10)
ssl_certfile None No Path to SSL certificate file (HTTPServer/GrpcServer)
ssl_keyfile None No Path to SSL key file (HTTPServer/GrpcServer)
blocking bool No If True, start() blocks until server stops (default False)

Outputs

Name Type Description
start() return t.ContextManager[ClientType] Context manager yielding HTTPClient or GrpcClient
get_client() return HTTPClient or GrpcClient Client connected to the running server

Usage Examples

# Start an HTTP server and interact with it via client
from bentoml.server import HTTPServer

server = HTTPServer("my_service:latest", host="127.0.0.1", port=3000)
with server.start() as client:
    response = client.call("predict", input_data)

# Start a gRPC server
from bentoml.server import GrpcServer

grpc_server = GrpcServer("my_service:latest", host="127.0.0.1", port=50051)
with grpc_server.start() as client:
    response = client.call("predict", input_data)

# Start in blocking mode (blocks until CTRL+C)
server = HTTPServer("my_service:latest")
server.start(blocking=True)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment