Principle:Bentoml BentoML Service Class Definition
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-13 15:00 GMT |
Overview
A design pattern for defining ML model serving endpoints as decorated Python classes. In BentoML, a service is a plain Python class transformed into a production-grade serving component via the @bentoml.service decorator, which adds HTTP/gRPC endpoint registration, resource configuration, and lifecycle management.
Description
Service class definition in BentoML is centered on the decorator pattern: the @bentoml.service decorator wraps a user-authored Python class in a Service[T] instance. This wrapper is responsible for:
- Endpoint registration -- methods decorated with
@bentoml.apibecome HTTP (or gRPC) endpoints with automatic request/response serialization. - Resource configuration -- CPU, memory, GPU, and concurrency limits are specified as keyword arguments to the decorator and enforced at runtime.
- Lifecycle management -- the class
__init__method runs once per worker process at startup, making it the natural place to load models and initialize expensive resources. - Dependency composition -- services can depend on other services, forming a directed acyclic graph that BentoML orchestrates across separate worker pools.
The decorator accepts optional parameters for naming, Docker image configuration, environment variables, labels, and a path prefix for URL routing. All resource and traffic configuration is passed via **kwargs that conform to the ServiceConfig TypedDict (covering resources, traffic, and workers settings).
Usage
Use the @bentoml.service decorator when you need to:
- Expose one or more ML model inference functions as network-accessible API endpoints.
- Declare resource requirements (GPUs, memory) that BentoML uses for scheduling and containerization.
- Compose multiple models or processing stages into a single deployable unit.
A minimal service definition looks like this:
import bentoml
@bentoml.service(
resources={"gpu": 1, "memory": "4Gi"},
traffic={"timeout": 120},
)
class MyMLService:
def __init__(self):
import torch
self.model = torch.load("model.pt")
@bentoml.api
def predict(self, input_text: str) -> str:
return self.model(input_text)
Theoretical Basis
The service class definition pattern applies the decorator pattern from object-oriented design: a structural pattern that attaches additional responsibilities to an object dynamically without altering its interface.
The abstract pattern is as follows:
SERVICE_CLASS_DEFINITION(class T):
DECORATOR @bentoml.service:
1. Introspect class T for @bentoml.api methods
2. Generate HTTP/gRPC route table from method signatures
3. Capture resource/traffic/worker configuration
4. Wrap T in Service[T] proxy
LIFECYCLE:
STARTUP: T.__init__() executes once per worker process
REQUEST: Incoming HTTP -> deserialize -> T.method() -> serialize -> HTTP response
SHUTDOWN: Worker process termination (graceful drain)
COMPOSITION:
Service[A] depends_on Service[B]
-> A and B run in separate worker pools
-> A calls B via inter-process RPC
Key theoretical properties:
- Encapsulation -- the user class contains only business logic; all serving infrastructure is injected by the decorator.
- Separation of concerns -- resource allocation, serialization, and routing are orthogonal to model inference logic.
- Composability -- services with declared dependencies form a DAG that BentoML deploys and orchestrates automatically.