Implementation:Bentoml BentoML Testing Server
| Knowledge Sources | |
|---|---|
| Domains | Testing, Serving |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
Provides utility functions and context managers for building, containerizing, and launching BentoML servers in various deployment modes during integration testing.
Description
The testing/server.py module offers a comprehensive set of tools for spinning up BentoML servers in test environments. It supports three deployment modes: standalone, distributed, and container-based. The module includes:
Helper Functions:
- parse_multipart_form() -- Async helper to parse multipart form data from Starlette headers and body bytes.
- kill_subprocess_tree() -- Cross-platform function to terminate a subprocess and its children. On Windows it uses
taskkill; on other platforms it callsterminate(). - server_warmup() -- Async function that polls a server URL (HTTP or gRPC) until it reports ready or a timeout is reached. For HTTP, it checks the
/readyzendpoint; for gRPC, it uses the health check service.
Build and Containerize Context Managers:
- build() -- Cached context manager that builds a BentoML project from a given path and optionally cleans it up on exit.
- containerize() -- Cached context manager that builds a Docker (or alternative backend) container image from a bento tag and optionally removes it on exit.
Server Launch Context Managers:
- run_bento_server_container() -- Launches a BentoML server inside a container with port mapping, config file mounting, and gRPC support. Yields the host URL after warmup.
- run_bento_server_standalone() -- Launches a BentoML server directly via the CLI as a subprocess. Yields the host URL.
- run_bento_server_distributed() -- Simulates a distributed (Yatai-like) deployment by starting individual runner server processes and an API server process separately, wiring them together with a runner map. Yields the host URL.
Orchestration:
- host_bento() -- High-level cached context manager that orchestrates the full lifecycle: build (if needed), optionally containerize, and launch a server in the requested deployment mode. It delegates to the appropriate
run_bento_server_*function based on thedeployment_modeparameter.
All context managers use cached_contextmanager to enable reuse within the same test session, avoiding redundant builds and container image creation.
Usage
Use these utilities in BentoML integration tests to start a server in a controlled environment, interact with it at the yielded host URL, and have it automatically cleaned up when the test completes.
Code Reference
Source Location
- Repository: Bentoml_BentoML
- File: src/bentoml/testing/server.py
- Lines: 1-535
Signature
async def parse_multipart_form(headers: Headers, body: bytes) -> FormData: ...
def kill_subprocess_tree(p: subprocess.Popen[t.Any]) -> None: ...
async def server_warmup(
host_url: str,
timeout: float,
grpc: bool = False,
check_interval: float = 1,
popen: subprocess.Popen[t.Any] | None = None,
service_name: str | None = None,
protocol_version: str = LATEST_PROTOCOL_VERSION,
) -> bool: ...
@cached_contextmanager("{project_path}, {cleanup}")
def build(project_path: str, cleanup: bool = True) -> t.Generator[Bento, None, None]: ...
@cached_contextmanager("{bento_tag}, {image_tag}, {cleanup}, {use_grpc}")
def containerize(
bento_tag: str | Tag,
image_tag: str | None = None,
cleanup: bool = True,
use_grpc: bool = False,
backend: str = "docker",
**attrs: t.Any,
) -> t.Generator[str, None, None]: ...
@cached_contextmanager(...)
def run_bento_server_container(
image_tag: str,
config_file: str | None = None,
use_grpc: bool = False,
timeout: float = 90,
host: str = "127.0.0.1",
backend: str = "docker",
protocol_version: str = LATEST_PROTOCOL_VERSION,
platform: str = "linux/amd64",
): ...
@contextmanager
def run_bento_server_standalone(
bento: str,
use_grpc: bool = False,
config_file: str | None = None,
timeout: float = 90,
host: str = "127.0.0.1",
protocol_version: str = LATEST_PROTOCOL_VERSION,
): ...
@contextmanager
def run_bento_server_distributed(
bento_tag: str | Tag,
config_file: str | None = None,
use_grpc: bool = False,
timeout: float = 90,
host: str = "127.0.0.1",
protocol_version: str = LATEST_PROTOCOL_VERSION,
): ...
@cached_contextmanager(...)
def host_bento(
bento_name: str | Tag | None = None,
project_path: str = ".",
config_file: str | None = None,
deployment_mode: t.Literal["standalone", "distributed", "container"] = "standalone",
bentoml_home: str | None = None,
use_grpc: bool = False,
clean_context: contextlib.ExitStack | None = None,
host: str = "127.0.0.1",
timeout: float = 120,
backend: str = "docker",
protocol_version: str = LATEST_PROTOCOL_VERSION,
container_mode_options: dict[str, t.Any] = None,
) -> t.Generator[str, None, None]: ...
Import
from bentoml.testing.server import host_bento, server_warmup, build, containerize
from bentoml.testing.server import run_bento_server_standalone
from bentoml.testing.server import run_bento_server_distributed
from bentoml.testing.server import run_bento_server_container
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| bento_name | Tag | None | No | Bento tag to serve; if None, the project is built first |
| project_path | str | No | Path to the BentoML project directory (default ".") |
| config_file | None | No | Path to a BentoML configuration YAML file |
| deployment_mode | Literal["standalone", "distributed", "container"] | No | How to run the server (default "standalone") |
| use_grpc | bool | No | Whether to launch a gRPC server instead of HTTP (default False) |
| host | str | No | Host address to bind (default "127.0.0.1") |
| timeout | float | No | Timeout in seconds for server warmup (default varies: 90-120) |
| backend | str | No | Container backend (default "docker") |
| protocol_version | str | No | gRPC protocol version string |
| clean_context | None | No | Shared exit stack for cleanup (useful in test sessions) |
Outputs
| Name | Type | Description |
|---|---|---|
| host_url | str | The host:port URL of the running server (e.g., "127.0.0.1:3000"), yielded by context managers |
| bento | Bento | The built Bento object (from build() context manager) |
| image_tag | str | The container image tag (from containerize() context manager) |
| warmup result | bool | True if server became ready within the timeout, False otherwise |
Usage Examples
# Host a bento in standalone mode for testing
from bentoml.testing.server import host_bento
with host_bento(
bento_name="my_service:latest",
deployment_mode="standalone",
timeout=60,
) as host_url:
# host_url is something like "127.0.0.1:3000"
import requests
resp = requests.get(f"http://{host_url}/readyz")
assert resp.status_code == 200
# Host in distributed mode (simulating Yatai)
with host_bento(
project_path="./my_project",
deployment_mode="distributed",
use_grpc=False,
) as host_url:
# Test against the distributed server
pass
# Use server_warmup directly
import asyncio
from bentoml.testing.server import server_warmup
ready = asyncio.run(server_warmup("127.0.0.1:3000", timeout=30))
assert ready