Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Bentoml BentoML Platform Serving Caveats

From Leeroopedia
Knowledge Sources
Domains Debugging, Infrastructure
Last Updated 2026-02-13 16:00 GMT

Overview

Platform-specific caveats for BentoML production serving: WSL uses TCP instead of Unix sockets, gRPC is unavailable in production on Windows, and macOS/FreeBSD gRPC has incorrect SO_REUSEPORT behavior.

Description

BentoML's production server (circus-based multi-process architecture) behaves differently across platforms. On POSIX systems (excluding WSL), it uses Unix domain sockets for inter-process communication between API servers and runners, which is faster than TCP. On Windows and WSL, it falls back to TCP sockets on localhost. gRPC production serving is not supported on Windows at all (requires `--development` mode). On macOS and FreeBSD, gRPC production serving may behave incorrectly due to platform-specific `SO_REUSEPORT` socket option differences. Additionally, the reloader plugin output is suppressed on Windows due to circus limitations.

Usage

Use this heuristic when choosing a deployment platform or debugging platform-specific issues with BentoML serving. Critical when deploying to Windows/WSL environments, when choosing between HTTP and gRPC protocols, or when troubleshooting connection issues between API servers and runners.

The Insight (Rule of Thumb)

  • Linux (native): Full support. Uses Unix domain sockets for runner communication. All features work.
  • WSL: Detected via `"microsoft-standard" in platform.release()`. Falls back to TCP sockets instead of Unix sockets. Functions like Windows for socket handling.
  • Windows: TCP sockets for runner communication. gRPC production serving is NOT supported (raises `BentoMLException`). Must use `--development` mode for gRPC. Reloader plugin output is hidden.
  • macOS/FreeBSD: gRPC production serving has incorrect behavior due to `SO_REUSEPORT` differences. Recommendation: containerize as Linux container.
  • Unix socket path limit: Max path length is 103 characters (`MAX_AF_UNIX_PATH_LENGTH = 103`). Socket paths use temp directories with runner IDs.
  • Container detection: Inside containers, worker processes do not respawn on crash (`respawn=not running_inside_container()`). Container orchestrators handle restarts instead.

Reasoning

Unix domain sockets provide lower latency and higher throughput than TCP for local inter-process communication, which is why BentoML prefers them on POSIX systems. The WSL exception exists because WSL's Unix socket implementation has historical reliability issues. gRPC's dependency on `SO_REUSEPORT` for load balancing across worker processes is a fundamental requirement that Windows cannot satisfy and macOS/FreeBSD implement differently than Linux.

WSL detection and socket selection from `serving.py:49, 356-363`:

IS_WSL = "microsoft-standard" in platform.release()

if psutil.POSIX and not IS_WSL:
    uds_path = tempfile.mkdtemp()
    get_socket_func = _get_runner_socket_posix
elif psutil.WINDOWS or IS_WSL:
    get_socket_func = _get_runner_socket_windows
else:
    raise NotImplementedError(f"Unsupported platform: {sys.platform}")

gRPC Windows restriction from `serving.py:604-607`:

if psutil.WINDOWS and (not development_mode):
    raise BentoMLException(
        "'grpc' is not supported on Windows without '--development'. "
        "The reason being SO_REUSEPORT socket option is only available on "
        "UNIX system, and gRPC implementation depends on this behaviour."
    )

macOS/FreeBSD gRPC warning from `serving.py:608-612`:

if psutil.MACOS or psutil.FREEBSD:
    logger.warning(
        "Due to gRPC implementation on exposing SO_REUSEPORT, BentoML "
        "production server's behaviour on %s is not correct. We recommend "
        "to containerize BentoServer as a Linux container instead.",
        "MacOS" if psutil.MACOS else "FreeBSD",
    )

Container respawn control from `serving.py:125`:

respawn=not running_inside_container(),

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment