Implementation:Bentoml BentoML Start Functions
| Knowledge Sources | |
|---|---|
| Domains | Serving, Process Management |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
Provides functions to start individual BentoML server components (runner servers, HTTP API servers, and gRPC API servers) in production distributed mode using the Circus process manager.
Description
The start.py module contains three primary functions for launching BentoML server components as managed process trees via the Circus process and socket manager:
- start_runner_server() -- Starts a standalone runner server process for a specific named runner. It loads the BentoML service, locates the requested runner, and creates a Circus arbiter with the appropriate watcher and socket configuration. Supports both standard BentoML runners (using Unix/TCP sockets) and Triton inference server runners.
- start_http_server() -- Starts the HTTP API server frontend. It receives a pre-computed runner_map (mapping runner names to their TCP addresses), validates that all required runners are accounted for, creates a Circus socket for the API server, and launches the HTTP API server worker processes. Supports SSL configuration and timeout parameters.
- start_grpc_server() -- Starts the gRPC API server frontend with the same runner_map pattern. It configures gRPC-specific options such as reflection, channelz, max concurrent streams, and protocol version. It also optionally spawns a Prometheus metrics server on a separate port.
All three functions use dependency injection via simple_di for default parameter values from BentoMLContainer, set up Prometheus multiprocess directories, and track usage analytics.
Usage
These functions are used internally by the BentoML CLI to start individual server components in a distributed deployment topology (as opposed to the all-in-one bentoml serve). They are typically invoked by the bentoml start subcommands or by orchestration systems like Yatai.
Code Reference
Source Location
- Repository: Bentoml_BentoML
- File: src/bentoml/start.py
- Lines: 1-409
Signature
def start_runner_server(
bento_identifier: str,
working_dir: str,
runner_name: str,
port: int | None = None,
host: str | None = None,
timeout: int | None = None,
backlog: int = Provide[BentoMLContainer.api_server_config.backlog],
) -> None: ...
def start_http_server(
bento_identifier: str,
runner_map: dict[str, str],
working_dir: str,
port: int = Provide[BentoMLContainer.api_server_config.port],
host: str = Provide[BentoMLContainer.api_server_config.host],
backlog: int = Provide[BentoMLContainer.api_server_config.backlog],
api_workers: int = Provide[BentoMLContainer.api_server_workers],
timeout: int | None = None,
ssl_certfile: str | None = ...,
ssl_keyfile: str | None = ...,
ssl_keyfile_password: str | None = ...,
ssl_version: int | None = ...,
ssl_cert_reqs: int | None = ...,
ssl_ca_certs: str | None = ...,
ssl_ciphers: str | None = ...,
timeout_keep_alive: int | None = None,
timeout_graceful_shutdown: int | None = None,
) -> None: ...
def start_grpc_server(
bento_identifier: str,
runner_map: dict[str, str],
working_dir: str,
port: int = Provide[BentoMLContainer.grpc.port],
host: str = Provide[BentoMLContainer.grpc.host],
backlog: int = ...,
api_workers: int = ...,
reflection: bool = ...,
channelz: bool = ...,
max_concurrent_streams: int | None = ...,
ssl_certfile: str | None = ...,
ssl_keyfile: str | None = ...,
ssl_ca_certs: str | None = ...,
protocol_version: str = LATEST_PROTOCOL_VERSION,
) -> None: ...
Import
from bentoml.start import start_runner_server, start_http_server, start_grpc_server
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| bento_identifier | str | Yes | The BentoML service import string or bento tag |
| working_dir | str | Yes | Absolute path to the working directory containing the service code |
| runner_name | str | Yes (runner only) | Name of the specific runner to start |
| runner_map | dict[str, str] | Yes (HTTP/gRPC) | Mapping of runner names to their TCP addresses (e.g., {"runner1": "tcp://127.0.0.1:5001"}) |
| port | int | No | Port to bind the server to (defaults from BentoMLContainer config) |
| host | str | No | Host address to bind to (defaults from BentoMLContainer config) |
| api_workers | int | No | Number of API worker processes |
| backlog | int | No | Socket backlog size |
| timeout | None | No | Request timeout in seconds |
| ssl_certfile | None | No | Path to SSL certificate file |
| ssl_keyfile | None | No | Path to SSL private key file |
| reflection | bool | No | Enable gRPC reflection (gRPC only) |
| channelz | bool | No | Enable gRPC channelz (gRPC only) |
| protocol_version | str | No | gRPC protocol version string (gRPC only) |
Outputs
| Name | Type | Description |
|---|---|---|
| (none) | None | All functions block, running the Circus arbiter until the process is stopped (CTRL+C). They do not return a value. |
Usage Examples
# Start a runner server for the "my_runner" runner
from bentoml.start import start_runner_server
start_runner_server(
bento_identifier="my_service:latest",
working_dir="/path/to/service",
runner_name="my_runner",
host="127.0.0.1",
port=5001,
)
# Start an HTTP API server with a pre-configured runner map
from bentoml.start import start_http_server
start_http_server(
bento_identifier="my_service:latest",
runner_map={"my_runner": "tcp://127.0.0.1:5001"},
working_dir="/path/to/service",
host="0.0.0.0",
port=3000,
api_workers=4,
)
# Start a gRPC API server
from bentoml.start import start_grpc_server
start_grpc_server(
bento_identifier="my_service:latest",
runner_map={"my_runner": "tcp://127.0.0.1:5001"},
working_dir="/path/to/service",
host="0.0.0.0",
port=50051,
reflection=True,
)