Implementation:Bentoml BentoML Serve Http Production
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
Concrete function for launching a BentoML service as a multi-process production HTTP server. The serve_http_production function creates a Circus-based process supervisor that manages multiple Uvicorn ASGI workers behind a shared socket. It is the internal engine behind the bentoml serve CLI command.
Description
The serve_http_production function performs the following steps:
- Parse the service module -- resolves the
bento_identifier(e.g.,"service:MyService") to locate the service class. - Build the dependency graph -- discovers dependent services and plans worker pools for each.
- Bind the socket -- creates a TCP socket on the specified
host:portwith the givenbacklog. - Configure Circus arbiter -- sets up Circus watchers (one per service in the graph), each spawning
api_workersUvicorn processes. - Start the arbiter -- begins supervision, monitoring worker health, and handling signals.
In development mode (development_mode=True), the function uses a simpler single-process setup with hot reload support. In production mode, it uses the full Circus supervisor with configurable worker counts, timeouts, and SSL.
Usage
This function is primarily invoked by the bentoml serve CLI command:
# Production mode (multi-process)
bentoml serve service:MyService --port 3000 --api-workers 4
# Development mode (single-process with reload)
bentoml serve service:MyService --reload --development
It can also be called programmatically (internal use):
from bentoml.serving import serve_http_production
server = serve_http_production(
bento_identifier="service:MyService",
working_dir=".",
port=3000,
host="0.0.0.0",
api_workers=4,
)
Code Reference
Source Location
- Repository:
bentoml/BentoML - File:
src/bentoml/serving.py(lines 310--556)
Signature
def serve_http_production(
bento_identifier: str,
working_dir: str,
port: int = 3000,
host: str = "0.0.0.0",
backlog: int = 2048,
api_workers: int = 1,
timeout: int | None = None,
development_mode: bool = False,
reload: bool = False,
threaded: bool = False,
ssl_certfile: str | None = None,
ssl_keyfile: str | None = None,
...
) -> Server
Import
# Internal import (not typically used directly by end users)
from bentoml.serving import serve_http_production
# User-facing entry point is the CLI:
# bentoml serve service:MyService
I/O Contract
Inputs
| Name | Type | Description |
|---|---|---|
bento_identifier |
str | Service module path in "module:ServiceClass" format (e.g., "service:MyService"). Can also be a Bento tag.
|
working_dir |
str | Working directory for the service (used to resolve relative imports and file paths). |
port |
int | TCP port to listen on. Defaults to 3000.
|
host |
str | Network interface to bind. Defaults to "0.0.0.0" (all interfaces).
|
backlog |
int | Maximum number of queued connections. Defaults to 2048.
|
api_workers |
int | Number of Uvicorn worker processes. Defaults to 1.
|
timeout |
None | Request timeout in seconds. Defaults to None (no timeout).
|
development_mode |
bool | If True, uses a single-process development server with more verbose logging. Defaults to False.
|
reload |
bool | If True, enables hot reload on source file changes (development only). Defaults to False.
|
threaded |
bool | If True, uses threaded workers instead of process-based workers. Defaults to False.
|
ssl_certfile |
None | Path to SSL certificate file for HTTPS. Defaults to None.
|
ssl_keyfile |
None | Path to SSL private key file for HTTPS. Defaults to None.
|
Outputs
| Name | Type | Description |
|---|---|---|
| Return value | Server | A Circus arbiter (or development server) managing the Uvicorn worker processes. The server runs until interrupted (SIGTERM/SIGINT) or programmatically stopped. |
Usage Examples
Example 1: Basic Production Serving
Launch a service on port 3000 with 4 workers.
bentoml serve service:TextClassifier --port 3000 --api-workers 4
- Circus supervises 4 Uvicorn worker processes.
- Requests are distributed across workers via the shared socket.
Example 2: Development Mode with Reload
Launch in development mode for rapid iteration.
bentoml serve service:TextClassifier --development --reload
- Runs a single-process server with verbose logging.
- Automatically restarts when source files change.
Example 3: SSL-Enabled Production Server
Serve HTTPS traffic directly without a reverse proxy.
bentoml serve service:TextClassifier \
--port 443 \
--api-workers 8 \
--ssl-certfile /etc/ssl/cert.pem \
--ssl-keyfile /etc/ssl/key.pem
- SSL termination is handled by Uvicorn within each worker.
- Suitable for deployments where a separate reverse proxy is not available.
Related Pages
- Principle:Bentoml_BentoML_HTTP_Production_Serving
- Environment:Bentoml_BentoML_Python_Runtime
- Environment:Bentoml_BentoML_NVIDIA_GPU_Resource
- Environment:Bentoml_BentoML_Triton_Inference_Server
- Heuristic:Bentoml_BentoML_Adaptive_Batching_Tuning
- Heuristic:Bentoml_BentoML_Worker_Count_Strategy
- Heuristic:Bentoml_BentoML_Thread_Env_Vars_Setting
- Heuristic:Bentoml_BentoML_Platform_Serving_Caveats