Implementation:Bentoml BentoML Ray Integration
| Knowledge Sources | |
|---|---|
| Domains | Ray, Deployment, Distributed Computing |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
Provides integration between BentoML services and Ray Serve, enabling deployment of BentoML bentos and runners as Ray Serve deployments.
Description
This module bridges BentoML with Ray Serve by converting BentoML services and runners into Ray Serve deployment objects. The main entry point is the deployment() function, which accepts a BentoML service (or Bento tag) and produces a bound Ray Serve deployment. Internally, each BentoML runner is wrapped in a RunnerDeployment class that is registered as a separate Ray Serve deployment, while the BentoML API server is wrapped in a BentoDeployment class that routes HTTP requests through BentoML's HTTPAppFactory. The module supports Ray Serve's native batching via the @serve.batch decorator for runner methods that have batchable=True. Configuration for both the service deployment and individual runner deployments can be specified through dictionaries. The helper function get_bento_runtime_env() is a placeholder for creating Ray RuntimeEnv from Bento environment configs.
Usage
Use this module when you want to deploy a BentoML service on a Ray cluster using Ray Serve. It allows you to leverage Ray's distributed computing capabilities for scaling BentoML runners independently.
Code Reference
Source Location
- Repository: Bentoml_BentoML
- File: src/bentoml/_internal/ray/__init__.py
- Lines: 1-219
Signature
def deployment(
target: str | Tag | bentoml.Bento | bentoml.legacy.Service,
service_deployment_config: dict[str, t.Any] | None = None,
runners_deployment_config_map: dict[str, dict[str, t.Any]] | None = None,
enable_batching: bool = False,
batching_config: dict[str, dict[str, dict[str, float | int]]] | None = None,
) -> Deployment: ...
def get_bento_runtime_env(bento_tag: str | Tag) -> RuntimeEnv: ...
def _get_runner_deployment(
svc: bentoml.legacy.Service,
runner_name: str,
runner_deployment_config: dict[str, t.Any],
enable_batching: bool,
batching_config: dict[str, dict[str, float | int]],
) -> Deployment: ...
def _get_service_deployment(svc: bentoml.legacy.Service, **kwargs: t.Any) -> Deployment: ...
def _deploy_bento_runners(
svc: bentoml.legacy.Service,
runners_deployment_config_map: dict | None = None,
enable_batching: bool = False,
batching_config: dict | None = None,
) -> dict[str, Deployment]: ...
Import
from bentoml._internal.ray import deployment
# or via the public API:
import bentoml
classifier = bentoml.ray.deployment("iris_classifier:latest")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| target | str, Tag, Bento, or Service | Yes | A BentoML service instance, Bento tag string, or Bento object to deploy |
| service_deployment_config | dict[str, Any] or None | No | Ray deployment config for the BentoML API server (e.g., num_replicas, route_prefix) |
| runners_deployment_config_map | dict[str, dict[str, Any]] or None | No | Ray deployment config map keyed by runner name |
| enable_batching | bool | No | Enable Ray Serve batching for batchable runner methods; defaults to False |
| batching_config | dict or None | No | Ray batching config by runner name and method name (e.g., max_batch_size, batch_wait_timeout_s) |
Outputs
| Name | Type | Description |
|---|---|---|
| deployment | ray.serve.Deployment | A bound Ray Serve Deployment ready to be run with serve run |
Usage Examples
import bentoml
# Basic deployment
classifier = bentoml.ray.deployment("iris_classifier:latest")
# Configured deployment with scaling
classifier = bentoml.ray.deployment(
"iris_classifier:latest",
{"route_prefix": "/hello", "num_replicas": 3, "ray_actor_options": {"num_cpus": 1}},
{"iris_clf": {"num_replicas": 1, "ray_actor_options": {"num_cpus": 5}}},
)
# With batching enabled
deploy = bentoml.ray.deployment(
"fraud_detection:latest",
{"num_replicas": 5, "ray_actor_options": {"num_cpus": 1}},
{"ieee-fraud-detection-sm": {"num_replicas": 1, "ray_actor_options": {"num_cpus": 5}}},
enable_batching=True,
batching_config={
"ieee-fraud-detection-sm": {
"predict_proba": {"max_batch_size": 5, "batch_wait_timeout_s": 0.2}
}
},
)