Implementation:Bentoml BentoML Serve Http Production

**Metadata**
Knowledge Sources	BentoML BentoML Serving
Domains	ML_Serving Production_Infrastructure
Last Updated	2026-02-13 15:00 GMT

Overview

Concrete function for launching a BentoML service as a multi-process production HTTP server. The serve_http_production function creates a Circus-based process supervisor that manages multiple Uvicorn ASGI workers behind a shared socket. It is the internal engine behind the bentoml serve CLI command.

Description

The serve_http_production function performs the following steps:

Parse the service module -- resolves the bento_identifier (e.g., "service:MyService") to locate the service class.
Build the dependency graph -- discovers dependent services and plans worker pools for each.
Bind the socket -- creates a TCP socket on the specified host:port with the given backlog.
Configure Circus arbiter -- sets up Circus watchers (one per service in the graph), each spawning api_workers Uvicorn processes.
Start the arbiter -- begins supervision, monitoring worker health, and handling signals.

In development mode (development_mode=True), the function uses a simpler single-process setup with hot reload support. In production mode, it uses the full Circus supervisor with configurable worker counts, timeouts, and SSL.

Usage

This function is primarily invoked by the bentoml serve CLI command:

# Production mode (multi-process)
bentoml serve service:MyService --port 3000 --api-workers 4

# Development mode (single-process with reload)
bentoml serve service:MyService --reload --development

It can also be called programmatically (internal use):

from bentoml.serving import serve_http_production

server = serve_http_production(
    bento_identifier="service:MyService",
    working_dir=".",
    port=3000,
    host="0.0.0.0",
    api_workers=4,
)

Code Reference

Source Location

Repository: bentoml/BentoML
File: src/bentoml/serving.py (lines 310--556)

Signature

def serve_http_production(
    bento_identifier: str,
    working_dir: str,
    port: int = 3000,
    host: str = "0.0.0.0",
    backlog: int = 2048,
    api_workers: int = 1,
    timeout: int | None = None,
    development_mode: bool = False,
    reload: bool = False,
    threaded: bool = False,
    ssl_certfile: str | None = None,
    ssl_keyfile: str | None = None,
    ...
) -> Server

Import

# Internal import (not typically used directly by end users)
from bentoml.serving import serve_http_production

# User-facing entry point is the CLI:
# bentoml serve service:MyService

I/O Contract

Inputs

**Input Contract**
Name	Type	Description
`bento_identifier`	str	Service module path in `"module:ServiceClass"` format (e.g., `"service:MyService"`). Can also be a Bento tag.
`working_dir`	str	Working directory for the service (used to resolve relative imports and file paths).
`port`	int	TCP port to listen on. Defaults to `3000`.
`host`	str	Network interface to bind. Defaults to `"0.0.0.0"` (all interfaces).
`backlog`	int	Maximum number of queued connections. Defaults to `2048`.
`api_workers`	int	Number of Uvicorn worker processes. Defaults to `1`.
`timeout`	None	Request timeout in seconds. Defaults to `None` (no timeout).
`development_mode`	bool	If `True`, uses a single-process development server with more verbose logging. Defaults to `False`.
`reload`	bool	If `True`, enables hot reload on source file changes (development only). Defaults to `False`.
`threaded`	bool	If `True`, uses threaded workers instead of process-based workers. Defaults to `False`.
`ssl_certfile`	None	Path to SSL certificate file for HTTPS. Defaults to `None`.
`ssl_keyfile`	None	Path to SSL private key file for HTTPS. Defaults to `None`.

Outputs

**Output Contract**
Name	Type	Description
Return value	Server	A Circus arbiter (or development server) managing the Uvicorn worker processes. The server runs until interrupted (SIGTERM/SIGINT) or programmatically stopped.

Usage Examples

Example 1: Basic Production Serving

Launch a service on port 3000 with 4 workers.

bentoml serve service:TextClassifier --port 3000 --api-workers 4

Circus supervises 4 Uvicorn worker processes.
Requests are distributed across workers via the shared socket.

Example 2: Development Mode with Reload

Launch in development mode for rapid iteration.

bentoml serve service:TextClassifier --development --reload

Runs a single-process server with verbose logging.
Automatically restarts when source files change.

Example 3: SSL-Enabled Production Server

Serve HTTPS traffic directly without a reverse proxy.

bentoml serve service:TextClassifier \
    --port 443 \
    --api-workers 8 \
    --ssl-certfile /etc/ssl/cert.pem \
    --ssl-keyfile /etc/ssl/key.pem

SSL termination is handled by Uvicorn within each worker.
Suitable for deployments where a separate reverse proxy is not available.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment