Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Bentoml BentoML Ray Integration

From Leeroopedia
Knowledge Sources
Domains Ray, Deployment, Distributed Computing
Last Updated 2026-02-13 15:00 GMT

Overview

Provides integration between BentoML services and Ray Serve, enabling deployment of BentoML bentos and runners as Ray Serve deployments.

Description

This module bridges BentoML with Ray Serve by converting BentoML services and runners into Ray Serve deployment objects. The main entry point is the deployment() function, which accepts a BentoML service (or Bento tag) and produces a bound Ray Serve deployment. Internally, each BentoML runner is wrapped in a RunnerDeployment class that is registered as a separate Ray Serve deployment, while the BentoML API server is wrapped in a BentoDeployment class that routes HTTP requests through BentoML's HTTPAppFactory. The module supports Ray Serve's native batching via the @serve.batch decorator for runner methods that have batchable=True. Configuration for both the service deployment and individual runner deployments can be specified through dictionaries. The helper function get_bento_runtime_env() is a placeholder for creating Ray RuntimeEnv from Bento environment configs.

Usage

Use this module when you want to deploy a BentoML service on a Ray cluster using Ray Serve. It allows you to leverage Ray's distributed computing capabilities for scaling BentoML runners independently.

Code Reference

Source Location

Signature

def deployment(
    target: str | Tag | bentoml.Bento | bentoml.legacy.Service,
    service_deployment_config: dict[str, t.Any] | None = None,
    runners_deployment_config_map: dict[str, dict[str, t.Any]] | None = None,
    enable_batching: bool = False,
    batching_config: dict[str, dict[str, dict[str, float | int]]] | None = None,
) -> Deployment: ...

def get_bento_runtime_env(bento_tag: str | Tag) -> RuntimeEnv: ...

def _get_runner_deployment(
    svc: bentoml.legacy.Service,
    runner_name: str,
    runner_deployment_config: dict[str, t.Any],
    enable_batching: bool,
    batching_config: dict[str, dict[str, float | int]],
) -> Deployment: ...

def _get_service_deployment(svc: bentoml.legacy.Service, **kwargs: t.Any) -> Deployment: ...

def _deploy_bento_runners(
    svc: bentoml.legacy.Service,
    runners_deployment_config_map: dict | None = None,
    enable_batching: bool = False,
    batching_config: dict | None = None,
) -> dict[str, Deployment]: ...

Import

from bentoml._internal.ray import deployment
# or via the public API:
import bentoml
classifier = bentoml.ray.deployment("iris_classifier:latest")

I/O Contract

Inputs

Name Type Required Description
target str, Tag, Bento, or Service Yes A BentoML service instance, Bento tag string, or Bento object to deploy
service_deployment_config dict[str, Any] or None No Ray deployment config for the BentoML API server (e.g., num_replicas, route_prefix)
runners_deployment_config_map dict[str, dict[str, Any]] or None No Ray deployment config map keyed by runner name
enable_batching bool No Enable Ray Serve batching for batchable runner methods; defaults to False
batching_config dict or None No Ray batching config by runner name and method name (e.g., max_batch_size, batch_wait_timeout_s)

Outputs

Name Type Description
deployment ray.serve.Deployment A bound Ray Serve Deployment ready to be run with serve run

Usage Examples

import bentoml

# Basic deployment
classifier = bentoml.ray.deployment("iris_classifier:latest")

# Configured deployment with scaling
classifier = bentoml.ray.deployment(
    "iris_classifier:latest",
    {"route_prefix": "/hello", "num_replicas": 3, "ray_actor_options": {"num_cpus": 1}},
    {"iris_clf": {"num_replicas": 1, "ray_actor_options": {"num_cpus": 5}}},
)

# With batching enabled
deploy = bentoml.ray.deployment(
    "fraud_detection:latest",
    {"num_replicas": 5, "ray_actor_options": {"num_cpus": 1}},
    {"ieee-fraud-detection-sm": {"num_replicas": 1, "ray_actor_options": {"num_cpus": 5}}},
    enable_batching=True,
    batching_config={
        "ieee-fraud-detection-sm": {
            "predict_proba": {"max_batch_size": 5, "batch_wait_timeout_s": 0.2}
        }
    },
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment