Implementation:Bentoml BentoML Testing Server

Knowledge Sources	Bentoml_BentoML
Domains	Testing, Serving
Last Updated	2026-02-13 15:00 GMT

Overview

Provides utility functions and context managers for building, containerizing, and launching BentoML servers in various deployment modes during integration testing.

Description

The testing/server.py module offers a comprehensive set of tools for spinning up BentoML servers in test environments. It supports three deployment modes: standalone, distributed, and container-based. The module includes:

Helper Functions:

parse_multipart_form() -- Async helper to parse multipart form data from Starlette headers and body bytes.
kill_subprocess_tree() -- Cross-platform function to terminate a subprocess and its children. On Windows it uses taskkill; on other platforms it calls terminate().
server_warmup() -- Async function that polls a server URL (HTTP or gRPC) until it reports ready or a timeout is reached. For HTTP, it checks the /readyz endpoint; for gRPC, it uses the health check service.

Build and Containerize Context Managers:

build() -- Cached context manager that builds a BentoML project from a given path and optionally cleans it up on exit.
containerize() -- Cached context manager that builds a Docker (or alternative backend) container image from a bento tag and optionally removes it on exit.

Server Launch Context Managers:

run_bento_server_container() -- Launches a BentoML server inside a container with port mapping, config file mounting, and gRPC support. Yields the host URL after warmup.
run_bento_server_standalone() -- Launches a BentoML server directly via the CLI as a subprocess. Yields the host URL.
run_bento_server_distributed() -- Simulates a distributed (Yatai-like) deployment by starting individual runner server processes and an API server process separately, wiring them together with a runner map. Yields the host URL.

Orchestration:

host_bento() -- High-level cached context manager that orchestrates the full lifecycle: build (if needed), optionally containerize, and launch a server in the requested deployment mode. It delegates to the appropriate run_bento_server_* function based on the deployment_mode parameter.

All context managers use cached_contextmanager to enable reuse within the same test session, avoiding redundant builds and container image creation.

Usage

Use these utilities in BentoML integration tests to start a server in a controlled environment, interact with it at the yielded host URL, and have it automatically cleaned up when the test completes.

Code Reference

Source Location

Repository: Bentoml_BentoML
File: src/bentoml/testing/server.py
Lines: 1-535

Signature

async def parse_multipart_form(headers: Headers, body: bytes) -> FormData: ...

def kill_subprocess_tree(p: subprocess.Popen[t.Any]) -> None: ...

async def server_warmup(
    host_url: str,
    timeout: float,
    grpc: bool = False,
    check_interval: float = 1,
    popen: subprocess.Popen[t.Any] | None = None,
    service_name: str | None = None,
    protocol_version: str = LATEST_PROTOCOL_VERSION,
) -> bool: ...

@cached_contextmanager("{project_path}, {cleanup}")
def build(project_path: str, cleanup: bool = True) -> t.Generator[Bento, None, None]: ...

@cached_contextmanager("{bento_tag}, {image_tag}, {cleanup}, {use_grpc}")
def containerize(
    bento_tag: str | Tag,
    image_tag: str | None = None,
    cleanup: bool = True,
    use_grpc: bool = False,
    backend: str = "docker",
    **attrs: t.Any,
) -> t.Generator[str, None, None]: ...

@cached_contextmanager(...)
def run_bento_server_container(
    image_tag: str,
    config_file: str | None = None,
    use_grpc: bool = False,
    timeout: float = 90,
    host: str = "127.0.0.1",
    backend: str = "docker",
    protocol_version: str = LATEST_PROTOCOL_VERSION,
    platform: str = "linux/amd64",
): ...

@contextmanager
def run_bento_server_standalone(
    bento: str,
    use_grpc: bool = False,
    config_file: str | None = None,
    timeout: float = 90,
    host: str = "127.0.0.1",
    protocol_version: str = LATEST_PROTOCOL_VERSION,
): ...

@contextmanager
def run_bento_server_distributed(
    bento_tag: str | Tag,
    config_file: str | None = None,
    use_grpc: bool = False,
    timeout: float = 90,
    host: str = "127.0.0.1",
    protocol_version: str = LATEST_PROTOCOL_VERSION,
): ...

@cached_contextmanager(...)
def host_bento(
    bento_name: str | Tag | None = None,
    project_path: str = ".",
    config_file: str | None = None,
    deployment_mode: t.Literal["standalone", "distributed", "container"] = "standalone",
    bentoml_home: str | None = None,
    use_grpc: bool = False,
    clean_context: contextlib.ExitStack | None = None,
    host: str = "127.0.0.1",
    timeout: float = 120,
    backend: str = "docker",
    protocol_version: str = LATEST_PROTOCOL_VERSION,
    container_mode_options: dict[str, t.Any] = None,
) -> t.Generator[str, None, None]: ...

Import

from bentoml.testing.server import host_bento, server_warmup, build, containerize
from bentoml.testing.server import run_bento_server_standalone
from bentoml.testing.server import run_bento_server_distributed
from bentoml.testing.server import run_bento_server_container

I/O Contract

Inputs

Name	Type	Required	Description
bento_name	Tag \| None	No	Bento tag to serve; if None, the project is built first
project_path	str	No	Path to the BentoML project directory (default ".")
config_file	None	No	Path to a BentoML configuration YAML file
deployment_mode	Literal["standalone", "distributed", "container"]	No	How to run the server (default "standalone")
use_grpc	bool	No	Whether to launch a gRPC server instead of HTTP (default False)
host	str	No	Host address to bind (default "127.0.0.1")
timeout	float	No	Timeout in seconds for server warmup (default varies: 90-120)
backend	str	No	Container backend (default "docker")
protocol_version	str	No	gRPC protocol version string
clean_context	None	No	Shared exit stack for cleanup (useful in test sessions)

Outputs

Name	Type	Description
host_url	str	The host:port URL of the running server (e.g., "127.0.0.1:3000"), yielded by context managers
bento	Bento	The built Bento object (from build() context manager)
image_tag	str	The container image tag (from containerize() context manager)
warmup result	bool	True if server became ready within the timeout, False otherwise

Usage Examples

# Host a bento in standalone mode for testing
from bentoml.testing.server import host_bento

with host_bento(
    bento_name="my_service:latest",
    deployment_mode="standalone",
    timeout=60,
) as host_url:
    # host_url is something like "127.0.0.1:3000"
    import requests
    resp = requests.get(f"http://{host_url}/readyz")
    assert resp.status_code == 200

# Host in distributed mode (simulating Yatai)
with host_bento(
    project_path="./my_project",
    deployment_mode="distributed",
    use_grpc=False,
) as host_url:
    # Test against the distributed server
    pass

# Use server_warmup directly
import asyncio
from bentoml.testing.server import server_warmup

ready = asyncio.run(server_warmup("127.0.0.1:3000", timeout=30))
assert ready

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment