Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Bentoml BentoML Start Functions

From Leeroopedia
Knowledge Sources
Domains Serving, Process Management
Last Updated 2026-02-13 15:00 GMT

Overview

Provides functions to start individual BentoML server components (runner servers, HTTP API servers, and gRPC API servers) in production distributed mode using the Circus process manager.

Description

The start.py module contains three primary functions for launching BentoML server components as managed process trees via the Circus process and socket manager:

  • start_runner_server() -- Starts a standalone runner server process for a specific named runner. It loads the BentoML service, locates the requested runner, and creates a Circus arbiter with the appropriate watcher and socket configuration. Supports both standard BentoML runners (using Unix/TCP sockets) and Triton inference server runners.
  • start_http_server() -- Starts the HTTP API server frontend. It receives a pre-computed runner_map (mapping runner names to their TCP addresses), validates that all required runners are accounted for, creates a Circus socket for the API server, and launches the HTTP API server worker processes. Supports SSL configuration and timeout parameters.
  • start_grpc_server() -- Starts the gRPC API server frontend with the same runner_map pattern. It configures gRPC-specific options such as reflection, channelz, max concurrent streams, and protocol version. It also optionally spawns a Prometheus metrics server on a separate port.

All three functions use dependency injection via simple_di for default parameter values from BentoMLContainer, set up Prometheus multiprocess directories, and track usage analytics.

Usage

These functions are used internally by the BentoML CLI to start individual server components in a distributed deployment topology (as opposed to the all-in-one bentoml serve). They are typically invoked by the bentoml start subcommands or by orchestration systems like Yatai.

Code Reference

Source Location

Signature

def start_runner_server(
    bento_identifier: str,
    working_dir: str,
    runner_name: str,
    port: int | None = None,
    host: str | None = None,
    timeout: int | None = None,
    backlog: int = Provide[BentoMLContainer.api_server_config.backlog],
) -> None: ...

def start_http_server(
    bento_identifier: str,
    runner_map: dict[str, str],
    working_dir: str,
    port: int = Provide[BentoMLContainer.api_server_config.port],
    host: str = Provide[BentoMLContainer.api_server_config.host],
    backlog: int = Provide[BentoMLContainer.api_server_config.backlog],
    api_workers: int = Provide[BentoMLContainer.api_server_workers],
    timeout: int | None = None,
    ssl_certfile: str | None = ...,
    ssl_keyfile: str | None = ...,
    ssl_keyfile_password: str | None = ...,
    ssl_version: int | None = ...,
    ssl_cert_reqs: int | None = ...,
    ssl_ca_certs: str | None = ...,
    ssl_ciphers: str | None = ...,
    timeout_keep_alive: int | None = None,
    timeout_graceful_shutdown: int | None = None,
) -> None: ...

def start_grpc_server(
    bento_identifier: str,
    runner_map: dict[str, str],
    working_dir: str,
    port: int = Provide[BentoMLContainer.grpc.port],
    host: str = Provide[BentoMLContainer.grpc.host],
    backlog: int = ...,
    api_workers: int = ...,
    reflection: bool = ...,
    channelz: bool = ...,
    max_concurrent_streams: int | None = ...,
    ssl_certfile: str | None = ...,
    ssl_keyfile: str | None = ...,
    ssl_ca_certs: str | None = ...,
    protocol_version: str = LATEST_PROTOCOL_VERSION,
) -> None: ...

Import

from bentoml.start import start_runner_server, start_http_server, start_grpc_server

I/O Contract

Inputs

Name Type Required Description
bento_identifier str Yes The BentoML service import string or bento tag
working_dir str Yes Absolute path to the working directory containing the service code
runner_name str Yes (runner only) Name of the specific runner to start
runner_map dict[str, str] Yes (HTTP/gRPC) Mapping of runner names to their TCP addresses (e.g., {"runner1": "tcp://127.0.0.1:5001"})
port int No Port to bind the server to (defaults from BentoMLContainer config)
host str No Host address to bind to (defaults from BentoMLContainer config)
api_workers int No Number of API worker processes
backlog int No Socket backlog size
timeout None No Request timeout in seconds
ssl_certfile None No Path to SSL certificate file
ssl_keyfile None No Path to SSL private key file
reflection bool No Enable gRPC reflection (gRPC only)
channelz bool No Enable gRPC channelz (gRPC only)
protocol_version str No gRPC protocol version string (gRPC only)

Outputs

Name Type Description
(none) None All functions block, running the Circus arbiter until the process is stopped (CTRL+C). They do not return a value.

Usage Examples

# Start a runner server for the "my_runner" runner
from bentoml.start import start_runner_server

start_runner_server(
    bento_identifier="my_service:latest",
    working_dir="/path/to/service",
    runner_name="my_runner",
    host="127.0.0.1",
    port=5001,
)

# Start an HTTP API server with a pre-configured runner map
from bentoml.start import start_http_server

start_http_server(
    bento_identifier="my_service:latest",
    runner_map={"my_runner": "tcp://127.0.0.1:5001"},
    working_dir="/path/to/service",
    host="0.0.0.0",
    port=3000,
    api_workers=4,
)

# Start a gRPC API server
from bentoml.start import start_grpc_server

start_grpc_server(
    bento_identifier="my_service:latest",
    runner_map={"my_runner": "tcp://127.0.0.1:5001"},
    working_dir="/path/to/service",
    host="0.0.0.0",
    port=50051,
    reflection=True,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment