Implementation:Bentoml BentoML Start Functions

Knowledge Sources	Bentoml_BentoML
Domains	Serving, Process Management
Last Updated	2026-02-13 15:00 GMT

Overview

Provides functions to start individual BentoML server components (runner servers, HTTP API servers, and gRPC API servers) in production distributed mode using the Circus process manager.

Description

The start.py module contains three primary functions for launching BentoML server components as managed process trees via the Circus process and socket manager:

start_runner_server() -- Starts a standalone runner server process for a specific named runner. It loads the BentoML service, locates the requested runner, and creates a Circus arbiter with the appropriate watcher and socket configuration. Supports both standard BentoML runners (using Unix/TCP sockets) and Triton inference server runners.

start_http_server() -- Starts the HTTP API server frontend. It receives a pre-computed runner_map (mapping runner names to their TCP addresses), validates that all required runners are accounted for, creates a Circus socket for the API server, and launches the HTTP API server worker processes. Supports SSL configuration and timeout parameters.

start_grpc_server() -- Starts the gRPC API server frontend with the same runner_map pattern. It configures gRPC-specific options such as reflection, channelz, max concurrent streams, and protocol version. It also optionally spawns a Prometheus metrics server on a separate port.

All three functions use dependency injection via simple_di for default parameter values from BentoMLContainer, set up Prometheus multiprocess directories, and track usage analytics.

Usage

These functions are used internally by the BentoML CLI to start individual server components in a distributed deployment topology (as opposed to the all-in-one bentoml serve). They are typically invoked by the bentoml start subcommands or by orchestration systems like Yatai.

Code Reference

Source Location

Repository: Bentoml_BentoML
File: src/bentoml/start.py
Lines: 1-409

Signature

def start_runner_server(
    bento_identifier: str,
    working_dir: str,
    runner_name: str,
    port: int | None = None,
    host: str | None = None,
    timeout: int | None = None,
    backlog: int = Provide[BentoMLContainer.api_server_config.backlog],
) -> None: ...

def start_http_server(
    bento_identifier: str,
    runner_map: dict[str, str],
    working_dir: str,
    port: int = Provide[BentoMLContainer.api_server_config.port],
    host: str = Provide[BentoMLContainer.api_server_config.host],
    backlog: int = Provide[BentoMLContainer.api_server_config.backlog],
    api_workers: int = Provide[BentoMLContainer.api_server_workers],
    timeout: int | None = None,
    ssl_certfile: str | None = ...,
    ssl_keyfile: str | None = ...,
    ssl_keyfile_password: str | None = ...,
    ssl_version: int | None = ...,
    ssl_cert_reqs: int | None = ...,
    ssl_ca_certs: str | None = ...,
    ssl_ciphers: str | None = ...,
    timeout_keep_alive: int | None = None,
    timeout_graceful_shutdown: int | None = None,
) -> None: ...

def start_grpc_server(
    bento_identifier: str,
    runner_map: dict[str, str],
    working_dir: str,
    port: int = Provide[BentoMLContainer.grpc.port],
    host: str = Provide[BentoMLContainer.grpc.host],
    backlog: int = ...,
    api_workers: int = ...,
    reflection: bool = ...,
    channelz: bool = ...,
    max_concurrent_streams: int | None = ...,
    ssl_certfile: str | None = ...,
    ssl_keyfile: str | None = ...,
    ssl_ca_certs: str | None = ...,
    protocol_version: str = LATEST_PROTOCOL_VERSION,
) -> None: ...

Import

from bentoml.start import start_runner_server, start_http_server, start_grpc_server

I/O Contract

Inputs

Name	Type	Required	Description
bento_identifier	str	Yes	The BentoML service import string or bento tag
working_dir	str	Yes	Absolute path to the working directory containing the service code
runner_name	str	Yes (runner only)	Name of the specific runner to start
runner_map	dict[str, str]	Yes (HTTP/gRPC)	Mapping of runner names to their TCP addresses (e.g., {"runner1": "tcp://127.0.0.1:5001"})
port	int	No	Port to bind the server to (defaults from BentoMLContainer config)
host	str	No	Host address to bind to (defaults from BentoMLContainer config)
api_workers	int	No	Number of API worker processes
backlog	int	No	Socket backlog size
timeout	None	No	Request timeout in seconds
ssl_certfile	None	No	Path to SSL certificate file
ssl_keyfile	None	No	Path to SSL private key file
reflection	bool	No	Enable gRPC reflection (gRPC only)
channelz	bool	No	Enable gRPC channelz (gRPC only)
protocol_version	str	No	gRPC protocol version string (gRPC only)

Outputs

Name	Type	Description
(none)	None	All functions block, running the Circus arbiter until the process is stopped (CTRL+C). They do not return a value.

Usage Examples

# Start a runner server for the "my_runner" runner
from bentoml.start import start_runner_server

start_runner_server(
    bento_identifier="my_service:latest",
    working_dir="/path/to/service",
    runner_name="my_runner",
    host="127.0.0.1",
    port=5001,
)

# Start an HTTP API server with a pre-configured runner map
from bentoml.start import start_http_server

start_http_server(
    bento_identifier="my_service:latest",
    runner_map={"my_runner": "tcp://127.0.0.1:5001"},
    working_dir="/path/to/service",
    host="0.0.0.0",
    port=3000,
    api_workers=4,
)

# Start a gRPC API server
from bentoml.start import start_grpc_server

start_grpc_server(
    bento_identifier="my_service:latest",
    runner_map={"my_runner": "tcp://127.0.0.1:5001"},
    working_dir="/path/to/service",
    host="0.0.0.0",
    port=50051,
    reflection=True,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment