Implementation:Bentoml BentoML Bentoml Service Decorator
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
Concrete decorator for transforming a plain Python class into a production-grade BentoML service. The @bentoml.service decorator is the primary entry point for defining ML serving endpoints in BentoML, wrapping user classes in a Service[T] instance that manages HTTP routing, resource allocation, and lifecycle.
Description
The @bentoml.service decorator inspects the target class for methods annotated with @bentoml.api, builds a route table, and wraps the class inside a Service[T] proxy. The decorator supports two invocation styles:
- Bare decorator --
@bentoml.servicewith no arguments (applied directly to the class). - Parameterized decorator --
@bentoml.service(name="...", resources={...}, ...)returning a decorator that is then applied to the class.
Internally, the function checks whether its first positional argument is a class or None. If a class is provided, it wraps immediately; otherwise, it returns a partial that awaits the class.
Usage
Import and apply the decorator:
import bentoml
@bentoml.service(
name="text-classifier",
resources={"gpu": 1},
traffic={"timeout": 60},
)
class TextClassifier:
def __init__(self):
self.model = load_my_model()
@bentoml.api
def classify(self, text: str) -> dict:
return self.model.predict(text)
Code Reference
Source Location
- Repository:
bentoml/BentoML - File:
src/_bentoml_sdk/service/factory.py(lines 532--606)
Signature
def service(
inner: type[T] | None = None,
/,
*,
name: str | None = None,
image: Image | None = None,
description: str | None = None,
path_prefix: str | None = None,
envs: list[ServiceEnvConfig] | None = None,
labels: dict[str, str] | None = None,
cmd: list[str] | None = None,
service_class: type[Service[T]] = Service,
**kwargs: Unpack[Config],
) -> Any
Import
import bentoml
# Used as:
@bentoml.service
class MyService: ...
# Or with parameters:
@bentoml.service(name="my-svc", resources={"gpu": 1})
class MyService: ...
I/O Contract
Inputs
| Name | Type | Description |
|---|---|---|
inner |
None | The Python class to wrap. Provided implicitly when the decorator is used without parentheses; None when used with keyword arguments.
|
name |
None | Custom service name. Defaults to the class name converted to a hyphenated slug. |
image |
None | Docker image configuration (base image, Python packages, system packages, etc.). |
description |
None | Human-readable description of the service. |
path_prefix |
None | URL path prefix for all routes in this service (e.g., /v1).
|
envs |
None | Environment variables to set in the container at runtime. |
labels |
None | Key-value labels for metadata (used in BentoCloud and container registries). |
cmd |
None | Custom command for the container entry point. |
service_class |
type[Service[T]] | The Service wrapper class. Defaults to Service.
|
**kwargs |
Unpack[Config] | Additional configuration conforming to ServiceConfig TypedDict, including resources (gpu, memory, cpu), traffic (timeout, concurrency, max_batch_size), and workers (count per service).
|
Outputs
| Name | Type | Description |
|---|---|---|
| Return value | Service[T] | A Service[T] instance wrapping the original class. This object holds the route table, resource configuration, and acts as the entry point for both local development and production serving.
|
Usage Examples
Example 1: Minimal Service
A bare-bones service with no extra configuration.
import bentoml
@bentoml.service
class HealthCheck:
@bentoml.api
def ping(self) -> str:
return "pong"
- The decorator is applied without parentheses, so
innerreceives the class directly. - The service name defaults to
"health-check".
Example 2: GPU Service with Image Config
A service with explicit resource requirements and a custom Docker image.
import bentoml
from bentoml.models import HuggingFaceModel
@bentoml.service(
name="llm-service",
image=bentoml.images.PythonImage(
python_version="3.11",
requirements_file="requirements.txt",
),
resources={"gpu": 1, "memory": "16Gi"},
traffic={"timeout": 300},
)
class LLMService:
model_ref = HuggingFaceModel("meta-llama/Llama-2-7b-hf")
def __init__(self):
import transformers
self.pipeline = transformers.pipeline(
"text-generation", model=self.model_ref.resolve()
)
@bentoml.api
def generate(self, prompt: str) -> str:
return self.pipeline(prompt)[0]["generated_text"]
resourcesandtrafficare passed through**kwargsand parsed asServiceConfig.imagespecifies the Docker build configuration used duringbentoml build.