Implementation:Bentoml BentoML Server Classes

Knowledge Sources	Bentoml_BentoML
Domains	Serving, Server Management
Last Updated	2026-02-13 15:00 GMT

Overview

Provides deprecated programmatic server classes (Server, HTTPServer, GrpcServer) for starting and managing BentoML serving processes as subprocesses with client connectivity.

Description

The server.py module defines an abstract base class Server and two concrete implementations, HTTPServer and GrpcServer, that allow users to programmatically launch BentoML serving processes. Each server wraps a subprocess invocation of the BentoML CLI serve command, manages the process lifecycle (start, wait-until-ready, stop), and provides typed client access (HTTPClient or GrpcClient) for communicating with the running server.

The module is marked as deprecated at import time; users are directed to use bentoml.serve() instead. The Server base class is generic over a ClientType bound to Client, supporting both HTTP and gRPC protocols. The start() method returns a context manager that yields the appropriate client, and the stop() method handles graceful and forceful subprocess termination. The HTTPServer additionally supports SSL configuration and timeout parameters, while GrpcServer supports gRPC-specific options such as reflection, channelz, max concurrent streams, and protocol version.

Usage

Use these classes when you need to programmatically start a BentoML server from Python code (for example, in integration tests or notebooks) and interact with it via a client. Prefer bentoml.serve() for new code as this module is deprecated.

Code Reference

Source Location

Repository: Bentoml_BentoML
File: src/bentoml/server.py
Lines: 1-478

Signature

class Server(ABC, t.Generic[ClientType]):
    def __init__(
        self,
        servable: str | Bento | Tag | Service | NewService[t.Any],
        serve_cmd: str,
        reload: bool,
        production: bool,
        env: t.Literal["conda"] | None,
        host: str,
        port: int,
        working_dir: str | None,
        api_workers: int | None,
        backlog: int,
        bento: str | Bento | Tag | Service | None = None,
        timeout: float = 10,
    ): ...

    def start(
        self,
        blocking: bool = False,
        env: dict[str, str] | None = None,
        stdin: _FILE = None,
        stdout: _FILE = None,
        stderr: _FILE = None,
        text: bool | None = None,
    ) -> t.ContextManager[ClientType]: ...

    def get_client(self) -> ClientType: ...
    def stop(self) -> None: ...

class HTTPServer(Server[HTTPClient]):
    def __init__(
        self,
        bento: str | Bento | Tag | Service,
        reload: bool = False,
        production: bool = True,
        env: t.Literal["conda"] | None = None,
        host: str = ...,
        port: int = ...,
        timeout: float = 10,
        working_dir: str | None = None,
        api_workers: int | None = ...,
        backlog: int = ...,
        ssl_certfile: str | None = ...,
        ssl_keyfile: str | None = ...,
        ssl_keyfile_password: str | None = ...,
        ssl_version: int | None = ...,
        ssl_cert_reqs: int | None = ...,
        ssl_ca_certs: str | None = ...,
        ssl_ciphers: str | None = ...,
        timeout_keep_alive: int | None = None,
        timeout_graceful_shutdown: int | None = None,
    ): ...

class GrpcServer(Server[GrpcClient]):
    def __init__(
        self,
        bento: str | Bento | Tag | Service,
        reload: bool = False,
        production: bool = True,
        env: t.Literal["conda"] | None = None,
        host: str = ...,
        port: int = ...,
        timeout: float = 10,
        working_dir: str | None = None,
        api_workers: int | None = ...,
        backlog: int = ...,
        enable_reflection: bool = ...,
        enable_channelz: bool = ...,
        max_concurrent_streams: int | None = ...,
        ssl_certfile: str | None = ...,
        ssl_keyfile: str | None = ...,
        ssl_ca_certs: str | None = ...,
        grpc_protocol_version: str | None = None,
    ): ...

Import

from bentoml.server import HTTPServer, GrpcServer, Server

I/O Contract

Inputs

Name	Type	Required	Description
servable	Bento \| Tag \| Service \| NewService	Yes	The BentoML service or bento tag to serve
host	str	No	The host address to bind (default from BentoMLContainer config)
port	int	No	The port number to bind (default from BentoMLContainer config)
reload	bool	No	Enable auto-reload on code changes (default False)
production	bool	No	Run in production mode (default True)
working_dir	None	No	Working directory for the service
api_workers	None	No	Number of API worker processes
backlog	int	No	Socket backlog size
timeout	float	No	Timeout in seconds to wait for server readiness (default 10)
ssl_certfile	None	No	Path to SSL certificate file (HTTPServer/GrpcServer)
ssl_keyfile	None	No	Path to SSL key file (HTTPServer/GrpcServer)
blocking	bool	No	If True, start() blocks until server stops (default False)

Outputs

Name	Type	Description
start() return	t.ContextManager[ClientType]	Context manager yielding HTTPClient or GrpcClient
get_client() return	HTTPClient or GrpcClient	Client connected to the running server

Usage Examples

# Start an HTTP server and interact with it via client
from bentoml.server import HTTPServer

server = HTTPServer("my_service:latest", host="127.0.0.1", port=3000)
with server.start() as client:
    response = client.call("predict", input_data)

# Start a gRPC server
from bentoml.server import GrpcServer

grpc_server = GrpcServer("my_service:latest", host="127.0.0.1", port=50051)
with grpc_server.start() as client:
    response = client.call("predict", input_data)

# Start in blocking mode (blocks until CTRL+C)
server = HTTPServer("my_service:latest")
server.start(blocking=True)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment