Principle:Bentoml BentoML Service Class Definition

**Metadata**
Knowledge Sources	BentoML BentoML Services
Domains	ML_Serving Service_Definition
Last Updated	2026-02-13 15:00 GMT

Overview

A design pattern for defining ML model serving endpoints as decorated Python classes. In BentoML, a service is a plain Python class transformed into a production-grade serving component via the @bentoml.service decorator, which adds HTTP/gRPC endpoint registration, resource configuration, and lifecycle management.

Description

Service class definition in BentoML is centered on the decorator pattern: the @bentoml.service decorator wraps a user-authored Python class in a Service[T] instance. This wrapper is responsible for:

Endpoint registration -- methods decorated with @bentoml.api become HTTP (or gRPC) endpoints with automatic request/response serialization.
Resource configuration -- CPU, memory, GPU, and concurrency limits are specified as keyword arguments to the decorator and enforced at runtime.
Lifecycle management -- the class __init__ method runs once per worker process at startup, making it the natural place to load models and initialize expensive resources.
Dependency composition -- services can depend on other services, forming a directed acyclic graph that BentoML orchestrates across separate worker pools.

The decorator accepts optional parameters for naming, Docker image configuration, environment variables, labels, and a path prefix for URL routing. All resource and traffic configuration is passed via **kwargs that conform to the ServiceConfig TypedDict (covering resources, traffic, and workers settings).

Usage

Use the @bentoml.service decorator when you need to:

Expose one or more ML model inference functions as network-accessible API endpoints.
Declare resource requirements (GPUs, memory) that BentoML uses for scheduling and containerization.
Compose multiple models or processing stages into a single deployable unit.

A minimal service definition looks like this:

import bentoml

@bentoml.service(
    resources={"gpu": 1, "memory": "4Gi"},
    traffic={"timeout": 120},
)
class MyMLService:
    def __init__(self):
        import torch
        self.model = torch.load("model.pt")

    @bentoml.api
    def predict(self, input_text: str) -> str:
        return self.model(input_text)

Theoretical Basis

The service class definition pattern applies the decorator pattern from object-oriented design: a structural pattern that attaches additional responsibilities to an object dynamically without altering its interface.

The abstract pattern is as follows:

SERVICE_CLASS_DEFINITION(class T):
    DECORATOR @bentoml.service:
        1. Introspect class T for @bentoml.api methods
        2. Generate HTTP/gRPC route table from method signatures
        3. Capture resource/traffic/worker configuration
        4. Wrap T in Service[T] proxy

    LIFECYCLE:
        STARTUP:  T.__init__() executes once per worker process
        REQUEST:  Incoming HTTP -> deserialize -> T.method() -> serialize -> HTTP response
        SHUTDOWN: Worker process termination (graceful drain)

    COMPOSITION:
        Service[A] depends_on Service[B]
            -> A and B run in separate worker pools
            -> A calls B via inter-process RPC

Key theoretical properties:

Encapsulation -- the user class contains only business logic; all serving infrastructure is injected by the decorator.
Separation of concerns -- resource allocation, serialization, and routing are orthogonal to model inference logic.
Composability -- services with declared dependencies form a DAG that BentoML deploys and orchestrates automatically.

Related Pages

Implementation:Bentoml_BentoML_Bentoml_Service_Decorator

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment