Implementation:Bentoml BentoML Server Classes
| Knowledge Sources | |
|---|---|
| Domains | Serving, Server Management |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
Provides deprecated programmatic server classes (Server, HTTPServer, GrpcServer) for starting and managing BentoML serving processes as subprocesses with client connectivity.
Description
The server.py module defines an abstract base class Server and two concrete implementations, HTTPServer and GrpcServer, that allow users to programmatically launch BentoML serving processes. Each server wraps a subprocess invocation of the BentoML CLI serve command, manages the process lifecycle (start, wait-until-ready, stop), and provides typed client access (HTTPClient or GrpcClient) for communicating with the running server.
The module is marked as deprecated at import time; users are directed to use bentoml.serve() instead. The Server base class is generic over a ClientType bound to Client, supporting both HTTP and gRPC protocols. The start() method returns a context manager that yields the appropriate client, and the stop() method handles graceful and forceful subprocess termination. The HTTPServer additionally supports SSL configuration and timeout parameters, while GrpcServer supports gRPC-specific options such as reflection, channelz, max concurrent streams, and protocol version.
Usage
Use these classes when you need to programmatically start a BentoML server from Python code (for example, in integration tests or notebooks) and interact with it via a client. Prefer bentoml.serve() for new code as this module is deprecated.
Code Reference
Source Location
- Repository: Bentoml_BentoML
- File: src/bentoml/server.py
- Lines: 1-478
Signature
class Server(ABC, t.Generic[ClientType]):
def __init__(
self,
servable: str | Bento | Tag | Service | NewService[t.Any],
serve_cmd: str,
reload: bool,
production: bool,
env: t.Literal["conda"] | None,
host: str,
port: int,
working_dir: str | None,
api_workers: int | None,
backlog: int,
bento: str | Bento | Tag | Service | None = None,
timeout: float = 10,
): ...
def start(
self,
blocking: bool = False,
env: dict[str, str] | None = None,
stdin: _FILE = None,
stdout: _FILE = None,
stderr: _FILE = None,
text: bool | None = None,
) -> t.ContextManager[ClientType]: ...
def get_client(self) -> ClientType: ...
def stop(self) -> None: ...
class HTTPServer(Server[HTTPClient]):
def __init__(
self,
bento: str | Bento | Tag | Service,
reload: bool = False,
production: bool = True,
env: t.Literal["conda"] | None = None,
host: str = ...,
port: int = ...,
timeout: float = 10,
working_dir: str | None = None,
api_workers: int | None = ...,
backlog: int = ...,
ssl_certfile: str | None = ...,
ssl_keyfile: str | None = ...,
ssl_keyfile_password: str | None = ...,
ssl_version: int | None = ...,
ssl_cert_reqs: int | None = ...,
ssl_ca_certs: str | None = ...,
ssl_ciphers: str | None = ...,
timeout_keep_alive: int | None = None,
timeout_graceful_shutdown: int | None = None,
): ...
class GrpcServer(Server[GrpcClient]):
def __init__(
self,
bento: str | Bento | Tag | Service,
reload: bool = False,
production: bool = True,
env: t.Literal["conda"] | None = None,
host: str = ...,
port: int = ...,
timeout: float = 10,
working_dir: str | None = None,
api_workers: int | None = ...,
backlog: int = ...,
enable_reflection: bool = ...,
enable_channelz: bool = ...,
max_concurrent_streams: int | None = ...,
ssl_certfile: str | None = ...,
ssl_keyfile: str | None = ...,
ssl_ca_certs: str | None = ...,
grpc_protocol_version: str | None = None,
): ...
Import
from bentoml.server import HTTPServer, GrpcServer, Server
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| servable | Bento | Tag | Service | NewService | Yes | The BentoML service or bento tag to serve |
| host | str | No | The host address to bind (default from BentoMLContainer config) |
| port | int | No | The port number to bind (default from BentoMLContainer config) |
| reload | bool | No | Enable auto-reload on code changes (default False) |
| production | bool | No | Run in production mode (default True) |
| working_dir | None | No | Working directory for the service |
| api_workers | None | No | Number of API worker processes |
| backlog | int | No | Socket backlog size |
| timeout | float | No | Timeout in seconds to wait for server readiness (default 10) |
| ssl_certfile | None | No | Path to SSL certificate file (HTTPServer/GrpcServer) |
| ssl_keyfile | None | No | Path to SSL key file (HTTPServer/GrpcServer) |
| blocking | bool | No | If True, start() blocks until server stops (default False) |
Outputs
| Name | Type | Description |
|---|---|---|
| start() return | t.ContextManager[ClientType] | Context manager yielding HTTPClient or GrpcClient |
| get_client() return | HTTPClient or GrpcClient | Client connected to the running server |
Usage Examples
# Start an HTTP server and interact with it via client
from bentoml.server import HTTPServer
server = HTTPServer("my_service:latest", host="127.0.0.1", port=3000)
with server.start() as client:
response = client.call("predict", input_data)
# Start a gRPC server
from bentoml.server import GrpcServer
grpc_server = GrpcServer("my_service:latest", host="127.0.0.1", port=50051)
with grpc_server.start() as client:
response = client.call("predict", input_data)
# Start in blocking mode (blocks until CTRL+C)
server = HTTPServer("my_service:latest")
server.start(blocking=True)