Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Bentoml BentoML Serve Http Production

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-13 15:00 GMT

Overview

Concrete function for launching a BentoML service as a multi-process production HTTP server. The serve_http_production function creates a Circus-based process supervisor that manages multiple Uvicorn ASGI workers behind a shared socket. It is the internal engine behind the bentoml serve CLI command.

Description

The serve_http_production function performs the following steps:

  1. Parse the service module -- resolves the bento_identifier (e.g., "service:MyService") to locate the service class.
  2. Build the dependency graph -- discovers dependent services and plans worker pools for each.
  3. Bind the socket -- creates a TCP socket on the specified host:port with the given backlog.
  4. Configure Circus arbiter -- sets up Circus watchers (one per service in the graph), each spawning api_workers Uvicorn processes.
  5. Start the arbiter -- begins supervision, monitoring worker health, and handling signals.

In development mode (development_mode=True), the function uses a simpler single-process setup with hot reload support. In production mode, it uses the full Circus supervisor with configurable worker counts, timeouts, and SSL.

Usage

This function is primarily invoked by the bentoml serve CLI command:

# Production mode (multi-process)
bentoml serve service:MyService --port 3000 --api-workers 4

# Development mode (single-process with reload)
bentoml serve service:MyService --reload --development

It can also be called programmatically (internal use):

from bentoml.serving import serve_http_production

server = serve_http_production(
    bento_identifier="service:MyService",
    working_dir=".",
    port=3000,
    host="0.0.0.0",
    api_workers=4,
)

Code Reference

Source Location

  • Repository: bentoml/BentoML
  • File: src/bentoml/serving.py (lines 310--556)

Signature

def serve_http_production(
    bento_identifier: str,
    working_dir: str,
    port: int = 3000,
    host: str = "0.0.0.0",
    backlog: int = 2048,
    api_workers: int = 1,
    timeout: int | None = None,
    development_mode: bool = False,
    reload: bool = False,
    threaded: bool = False,
    ssl_certfile: str | None = None,
    ssl_keyfile: str | None = None,
    ...
) -> Server

Import

# Internal import (not typically used directly by end users)
from bentoml.serving import serve_http_production

# User-facing entry point is the CLI:
# bentoml serve service:MyService

I/O Contract

Inputs

Input Contract
Name Type Description
bento_identifier str Service module path in "module:ServiceClass" format (e.g., "service:MyService"). Can also be a Bento tag.
working_dir str Working directory for the service (used to resolve relative imports and file paths).
port int TCP port to listen on. Defaults to 3000.
host str Network interface to bind. Defaults to "0.0.0.0" (all interfaces).
backlog int Maximum number of queued connections. Defaults to 2048.
api_workers int Number of Uvicorn worker processes. Defaults to 1.
timeout None Request timeout in seconds. Defaults to None (no timeout).
development_mode bool If True, uses a single-process development server with more verbose logging. Defaults to False.
reload bool If True, enables hot reload on source file changes (development only). Defaults to False.
threaded bool If True, uses threaded workers instead of process-based workers. Defaults to False.
ssl_certfile None Path to SSL certificate file for HTTPS. Defaults to None.
ssl_keyfile None Path to SSL private key file for HTTPS. Defaults to None.

Outputs

Output Contract
Name Type Description
Return value Server A Circus arbiter (or development server) managing the Uvicorn worker processes. The server runs until interrupted (SIGTERM/SIGINT) or programmatically stopped.

Usage Examples

Example 1: Basic Production Serving

Launch a service on port 3000 with 4 workers.

bentoml serve service:TextClassifier --port 3000 --api-workers 4
  • Circus supervises 4 Uvicorn worker processes.
  • Requests are distributed across workers via the shared socket.

Example 2: Development Mode with Reload

Launch in development mode for rapid iteration.

bentoml serve service:TextClassifier --development --reload
  • Runs a single-process server with verbose logging.
  • Automatically restarts when source files change.

Example 3: SSL-Enabled Production Server

Serve HTTPS traffic directly without a reverse proxy.

bentoml serve service:TextClassifier \
    --port 443 \
    --api-workers 8 \
    --ssl-certfile /etc/ssl/cert.pem \
    --ssl-keyfile /etc/ssl/key.pem
  • SSL termination is handled by Uvicorn within each worker.
  • Suitable for deployments where a separate reverse proxy is not available.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment